[jira] [Commented] (LUCENE-5422) Postings lists deduplication

2014-03-20 Thread Vishmi Money (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942821#comment-13942821
 ] 

Vishmi Money commented on LUCENE-5422:
--

Hi [~dmitry_key], [~otis], [~mikemccand],
I was looking forward for a reply for the comment by [~mikemccand] as it will 
be a great support for me if there will be a mentor for this project. So then I 
can directly get support from him/her to resolve the questions I get when I am 
proceeding with this. I am kindly hoping for a positive answer.

Thank you.

> Postings lists deduplication
> 
>
> Key: LUCENE-5422
> URL: https://issues.apache.org/jira/browse/LUCENE-5422
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Dmitry Kan
>  Labels: gsoc2014
>
> The context:
> http://markmail.org/thread/tywtrjjcfdbzww6f
> Robert Muir and I have discussed what Robert eventually named "postings
> lists deduplication" at Berlin Buzzwords 2013 conference.
> The idea is to allow multiple terms to point to the same postings list to
> save space. This can be achieved by new index codec implementation, but this 
> jira is open to other ideas as well.
> The application / impact of this is positive for synonyms, exact / inexact
> terms, leading wildcard support via storing reversed term etc.
> For example, at the moment, when supporting exact (unstemmed) and inexact 
> (stemmed)
> searches, we store both unstemmed and stemmed variant of a word form and
> that leads to index bloating. That is why we had to remove the leading
> wildcard support via reversing a token on index and query time because of
> the same index size considerations.
> Comment from Mike McCandless:
> Neat idea!
> Would this idea allow a single term to point to (the union of) N other
> posting lists?  It seems like that's necessary e.g. to handle the
> exact/inexact case.
> And then, to produce the Docs/AndPositionsEnum you'd need to do the
> merge sort across those N posting lists?
> Such a thing might also be do-able as runtime only wrapper around the
> postings API (FieldsProducer), if you could at runtime do the reverse
> expansion (e.g. stem -> all of its surface forms).
> Comment from Robert Muir:
> I think the exact/inexact is trickier (detecting it would be the hard
> part), and you are right, another solution might work better.
> but for the reverse wildcard and synonyms situation, it seems we could even
> detect it on write if we created some hash of the previous terms postings.
> if the hash matches for the current term, we know it might be a "duplicate"
> and would have to actually do the costly check they are the same.
> maybe there are better ways to do it, but it might be a fun postingformat
> experiment to try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (LUCENE-5544) exceptions during IW.rollback can leak files and locks

2014-03-20 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5544:
---

Comment: was deleted

(was: Apologies for the multiple comments -- what did you mean by {{// don't 
call ensureOpen here: this acts like "close()" in closeable.}}? That the app 
can call rollback() multiple times? Because currently it can't, since writeLock 
is set to null by the first call and the second call w/ try to sync on a null 
instance and hit NPE?)

> exceptions during IW.rollback can leak files and locks
> --
>
> Key: LUCENE-5544
> URL: https://issues.apache.org/jira/browse/LUCENE-5544
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.8, 5.0, 4.7.1
>
> Attachments: LUCENE-5544.patch
>
>
> Today, rollback() doesn't always succeed: if it does, it closes the writer 
> nicely. otherwise, if it hits exception, it leaves you with a half-broken 
> writer, still potentially holding file handles and write lock.
> This is especially bad if you use Native locks, because you are kind of 
> hosed, the static map prevents you from forcefully unlocking (e.g. 
> IndexWriter.unlock) so you have no real course of action to try to recover.
> If rollback() hits exception, it should still deliver the exception, but 
> release things (e.g. like IOUtils.close).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks

2014-03-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942805#comment-13942805
 ] 

Robert Muir commented on LUCENE-5544:
-

You can definitely call it multiple times, and some tests in fact do just that. 
Thats why IOUtils.close() is used, which does nothing on a null parameter.

> exceptions during IW.rollback can leak files and locks
> --
>
> Key: LUCENE-5544
> URL: https://issues.apache.org/jira/browse/LUCENE-5544
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.8, 5.0, 4.7.1
>
> Attachments: LUCENE-5544.patch
>
>
> Today, rollback() doesn't always succeed: if it does, it closes the writer 
> nicely. otherwise, if it hits exception, it leaves you with a half-broken 
> writer, still potentially holding file handles and write lock.
> This is especially bad if you use Native locks, because you are kind of 
> hosed, the static map prevents you from forcefully unlocking (e.g. 
> IndexWriter.unlock) so you have no real course of action to try to recover.
> If rollback() hits exception, it should still deliver the exception, but 
> release things (e.g. like IOUtils.close).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks

2014-03-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942804#comment-13942804
 ] 

Shai Erera commented on LUCENE-5544:


Apologies for the multiple comments -- what did you mean by {{// don't call 
ensureOpen here: this acts like "close()" in closeable.}}? That the app can 
call rollback() multiple times? Because currently it can't, since writeLock is 
set to null by the first call and the second call w/ try to sync on a null 
instance and hit NPE?

> exceptions during IW.rollback can leak files and locks
> --
>
> Key: LUCENE-5544
> URL: https://issues.apache.org/jira/browse/LUCENE-5544
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.8, 5.0, 4.7.1
>
> Attachments: LUCENE-5544.patch
>
>
> Today, rollback() doesn't always succeed: if it does, it closes the writer 
> nicely. otherwise, if it hits exception, it leaves you with a half-broken 
> writer, still potentially holding file handles and write lock.
> This is especially bad if you use Native locks, because you are kind of 
> hosed, the static map prevents you from forcefully unlocking (e.g. 
> IndexWriter.unlock) so you have no real course of action to try to recover.
> If rollback() hits exception, it should still deliver the exception, but 
> release things (e.g. like IOUtils.close).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks

2014-03-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942802#comment-13942802
 ] 

Shai Erera commented on LUCENE-5544:


bq. Just thinking about making the test more evil.

Though if the exception happens from Lock.close(), the lock will still exist 
and the test will fail asserting that the writer isn't locked. It's a valid 
exception but nothing we can do about it while calling rollback(). So maybe 
exclude it from the list of allowed places to fail.

Do you think it's better to not swallow the exceptions in the finally part, but 
add them as suppressed to any original exception? Because if e.g. lock.close() 
fails, app won't be able to open a new writer, yet all it has as info is the 
original exception that happened during rollback(), and no info that the lock 
couldn't be released either.

> exceptions during IW.rollback can leak files and locks
> --
>
> Key: LUCENE-5544
> URL: https://issues.apache.org/jira/browse/LUCENE-5544
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.8, 5.0, 4.7.1
>
> Attachments: LUCENE-5544.patch
>
>
> Today, rollback() doesn't always succeed: if it does, it closes the writer 
> nicely. otherwise, if it hits exception, it leaves you with a half-broken 
> writer, still potentially holding file handles and write lock.
> This is especially bad if you use Native locks, because you are kind of 
> hosed, the static map prevents you from forcefully unlocking (e.g. 
> IndexWriter.unlock) so you have no real course of action to try to recover.
> If rollback() hits exception, it should still deliver the exception, but 
> release things (e.g. like IOUtils.close).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks

2014-03-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942795#comment-13942795
 ] 

Robert Muir commented on LUCENE-5544:
-

{quote}
About the test, maybe instead of asserting that IW.isLocked == false, try to 
open a new IW? I guess it will fail if you remove the stuff that you added to 
the finally clause? That will guarantee that we test what the app is likely to 
do after calling rollback().
{quote}

Well the current test doesnt even need that assert: its just for clarity. we 
dont need an assert for this stuff at all: the last line of directory.close() 
(MDW) will fail if there are open locks or files!

{quote}
And also, do you think it's better to use MDW.failOn to randomly fail if we're 
somewhere in rollback() stack? Cause currently the test fails only in one of 
two places. Just thinking about making the test more evil.
{quote}

This is a good idea. 


> exceptions during IW.rollback can leak files and locks
> --
>
> Key: LUCENE-5544
> URL: https://issues.apache.org/jira/browse/LUCENE-5544
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.8, 5.0, 4.7.1
>
> Attachments: LUCENE-5544.patch
>
>
> Today, rollback() doesn't always succeed: if it does, it closes the writer 
> nicely. otherwise, if it hits exception, it leaves you with a half-broken 
> writer, still potentially holding file handles and write lock.
> This is especially bad if you use Native locks, because you are kind of 
> hosed, the static map prevents you from forcefully unlocking (e.g. 
> IndexWriter.unlock) so you have no real course of action to try to recover.
> If rollback() hits exception, it should still deliver the exception, but 
> release things (e.g. like IOUtils.close).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5228) Don't require or be inside of -- or that be inside of

2014-03-20 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942792#comment-13942792
 ] 

Shawn Heisey commented on SOLR-5228:


The schema version changes how Solr interprets default settings.  I'm fairly 
sure that it has nothing to do with the XML structure.  I don't think we need a 
new schema version for this.

+1 to Robert's idea in the first comment.  I will restate it below to make sure 
I understand it properly:

 * Allow  and  at the top level under .
 * Deprecate  and  in 4x.  Remove them in trunk.  The unknown 
tags will fail parsing.
 * Don't worry about supporting all options in the deprecated sections.


> Don't require  or  be inside of  -- or that 
>  be inside of 
> -
>
> Key: SOLR-5228
> URL: https://issues.apache.org/jira/browse/SOLR-5228
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> On the solr-user mailing list, Nutan recently mentioned spending days trying 
> to track down a problem that turned out to be because he had attempted to add 
> a {{}} that was outside of the {{}} block in his 
> schema.xml -- Solr was just silently ignoring it.
> We have made improvements in other areas of config validation by generating 
> statup errors when tags/attributes are found that are not expected -- but in 
> this case i think we should just stop expecting/requiring that the 
> {{}} and {{}} tags will be used to group these sorts of 
> things.  I think schema.xml parsing should just start ignoring them and only 
> care about finding the {{}}, {{}}, and {{}} 
> tags wherever they may be.
> If people want to keep using them, fine.  If people want to mix fieldTypes 
> and fields side by side (perhaps specify a fieldType, then list all the 
> fields using it) fine.  I don't see any value in forcing people to use them, 
> but we definitely shouldn't leave things the way they are with otherwise 
> perfectly valid field/type declarations being silently ignored.
> ---
> I'll take this on unless i see any objections.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5544) exceptions during IW.rollback can leak files and locks

2014-03-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942789#comment-13942789
 ] 

Shai Erera commented on LUCENE-5544:


Patch looks good. So basically with this patch, the state of IW after 
rollback() is it's always closed() and doesn't leak any important resources 
like write.lock and pooled readers. And there's no way to continue using this 
instance - app must create a new IW instance. We can still end up w/ a 
segments_N file in the directory though (if its delete failed), but I guess IW 
will detect it's corrupt and use the one from the previous commit.

About the test, maybe instead of asserting that IW.isLocked == false, try to 
open a new IW? I guess it will fail if you remove the stuff that you added to 
the finally clause? That will guarantee that we test what the app is likely to 
do after calling rollback().

And also, do you think it's better to use MDW.failOn to randomly fail if we're 
somewhere in rollback() stack? Cause currently the test fails only in one of 
two places. Just thinking about making the test more evil.

> exceptions during IW.rollback can leak files and locks
> --
>
> Key: LUCENE-5544
> URL: https://issues.apache.org/jira/browse/LUCENE-5544
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.8, 5.0, 4.7.1
>
> Attachments: LUCENE-5544.patch
>
>
> Today, rollback() doesn't always succeed: if it does, it closes the writer 
> nicely. otherwise, if it hits exception, it leaves you with a half-broken 
> writer, still potentially holding file handles and write lock.
> This is especially bad if you use Native locks, because you are kind of 
> hosed, the static map prevents you from forcefully unlocking (e.g. 
> IndexWriter.unlock) so you have no real course of action to try to recover.
> If rollback() hits exception, it should still deliver the exception, but 
> release things (e.g. like IOUtils.close).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.7.0_51) - Build # 9755 - Failure!

2014-03-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9755/
Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 
-XX:-UseSuperWord

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.rest.schema.TestDynamicFieldCollectionResource

Error Message:
ERROR: SolrIndexSearcher opens=1 closes=3

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=1 closes=3
at __randomizedtesting.SeedInfo.seed([AF66F1D89DBCEAEB]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:405)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:176)
at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:789)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at java.lang.Thread.run(Thread.java:744)




Build Log:
[...truncated 10443 lines...]
   [junit4] Suite: 
org.apache.solr.rest.schema.TestDynamicFieldCollectionResource
   [junit4]   2> 287789 T1125 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl 
(true) and clientAuth (true)
   [junit4]   2> 287790 T1125 oas.SolrTestCaseJ4.initCore initCore
   [junit4]   2> Creating dataDir: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build/solr-core/test/J1/./solrtest-TestDynamicFieldCollectionResource-1395375921711
   [junit4]   2> 287790 T1125 oas.SolrTestCaseJ4.initCore initCore end
   [junit4]   2> 287791 T1125 oejs.Server.doStart jetty-8.1.10.v20130312
   [junit4]   2> 287794 T1125 oejus.SslContextFactory.doStart Enabled Protocols 
[SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, 
TLSv1.1, TLSv1.2]
   [junit4]   2> 287797 T1125 oejs.AbstractConnector.doStart Started 
SslSelectChannelConnector@127.0.0.1:47178
   [junit4]   2> 287798 T1125 oass.SolrDispatchFilter.init 
SolrDispatchFilter.init()
   [junit4]   2> 287799 T1125 oasc.SolrResourceLoader.locateSolrHome JNDI not 
configured for solr (NoInitialContextEx)
   [junit4]   2> 287799 T1125 oasc.SolrResourceLoader.locateSolrHome using 
system property solr.solr.home: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/core/src/test-files/solr
   [junit4]   2> 287799 T1125 oasc.SolrResourceLoader. new 
SolrResourceLoader for directory: 
'/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/core/src/test-files/solr/'
   [junit4]   2> 287813 T1125 oasc.ConfigSolr.fromFile Loading container 
configuration from 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/core/src/test-files/solr/solr.xml
   [junit4]   2> 287857 T1125 oasc.CoreContainer. New CoreContainer 
811381485
   [junit4]   2> 287857 T1125 oasc.CoreContainer.load Loading cores into 
CoreContainer 
[instanceDir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/core/src/test-files/solr/]
   [junit4]   2> 287858 T1125 oashc.HttpShardHandlerFactory.getParameter 
Setting socketTimeout to: 9
   [junit4]   2> 287859 T1125 oashc.HttpShardHandlerFactory.getParameter 
Setting urlScheme to: https
   [junit4]   2> 287859 T1125 oashc.HttpShardHandlerFactory.getParameter 

[jira] [Updated] (LUCENE-5544) exceptions during IW.rollback can leak files and locks

2014-03-20 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5544:


Attachment: LUCENE-5544.patch

Here's the start to a patch. Really the current rollback code is too crazy, 
there is no need for it to call the super-scary closeInternal(false, false) at 
the end, when in this case all that huge complicated piece of code is doing, is 
just calling close() on IndexFileDeleter and releasing write.lock.


> exceptions during IW.rollback can leak files and locks
> --
>
> Key: LUCENE-5544
> URL: https://issues.apache.org/jira/browse/LUCENE-5544
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.8, 5.0, 4.7.1
>
> Attachments: LUCENE-5544.patch
>
>
> Today, rollback() doesn't always succeed: if it does, it closes the writer 
> nicely. otherwise, if it hits exception, it leaves you with a half-broken 
> writer, still potentially holding file handles and write lock.
> This is especially bad if you use Native locks, because you are kind of 
> hosed, the static map prevents you from forcefully unlocking (e.g. 
> IndexWriter.unlock) so you have no real course of action to try to recover.
> If rollback() hits exception, it should still deliver the exception, but 
> release things (e.g. like IOUtils.close).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5544) exceptions during IW.rollback can leak files and locks

2014-03-20 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5544:
---

 Summary: exceptions during IW.rollback can leak files and locks
 Key: LUCENE-5544
 URL: https://issues.apache.org/jira/browse/LUCENE-5544
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.8, 5.0, 4.7.1


Today, rollback() doesn't always succeed: if it does, it closes the writer 
nicely. otherwise, if it hits exception, it leaves you with a half-broken 
writer, still potentially holding file handles and write lock.

This is especially bad if you use Native locks, because you are kind of hosed, 
the static map prevents you from forcefully unlocking (e.g. IndexWriter.unlock) 
so you have no real course of action to try to recover.

If rollback() hits exception, it should still deliver the exception, but 
release things (e.g. like IOUtils.close).





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5542) Explore making DVConsumer sparse-aware

2014-03-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942754#comment-13942754
 ] 

Shai Erera commented on LUCENE-5542:


I don't think it makes the API more complicated. To the users of the API we say 
"pass only docs with values". To the Codec developers we say "you are going to 
get only docs with values, so encode however you see fit such that you can 
later provide docsWithFields efficiently". It's not about performance yet, but 
about making the API clear (in my opinion) - stating that {{null}} denotes a 
missing value for a document is not better than just not passing the document 
in the first place.

> Explore making DVConsumer sparse-aware
> --
>
> Key: LUCENE-5542
> URL: https://issues.apache.org/jira/browse/LUCENE-5542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Shai Erera
>
> Today DVConsumer API requires the caller to pass a value for every document, 
> where {{null}} means "this doc has no value". The Codec can then choose how 
> to encode the values, i.e. whether it encodes a 0 for a numeric field, or 
> encodes the sparse docs. In practice, from what I see, we choose to encode 
> the 0s.
> I wonder if we e.g. added an {{Iterable}} to 
> DVConsumer.addXYZField(), if that would make a better API. The caller only 
> passes  pairs and it's up to the Codec to decide how it wants to 
> encode the missing values. Like, if a user's app truly has a sparse NDV, 
> IndexWriter doesn't need to "fill the gaps" artificially. It's the job of the 
> Codec.
> To be clear, I don't propose to change any Codec implementation in this issue 
> (w.r.t. sparse encoding - yes/no), only change the API to reflect that 
> sparseness. I think that if we'll ever want to encode sparse values, it will 
> be a more convenient API.
> Thoughts? I volunteer to do this work, but want to get others' opinion before 
> I start.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0) - Build # 9861 - Still Failing!

2014-03-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9861/
Java: 32bit/jdk1.8.0 -server -XX:+UseSerialGC

1 tests failed.
FAILED:  
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin 
{#2 seed=[7FF2374655968ABC:36A2C61D9F0BFDF1]}

Error Message:
Shouldn't match I#1:Pt(x=-48.0,y=-66.0) Q:Pt(x=-90.0,y=-76.0)

Stack Trace:
java.lang.AssertionError: Shouldn't match I#1:Pt(x=-48.0,y=-66.0) 
Q:Pt(x=-90.0,y=-76.0)
at 
__randomizedtesting.SeedInfo.seed([7FF2374655968ABC:36A2C61D9F0BFDF1]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:355)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:335)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin(SpatialOpRecursivePrefixTreeTest.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at java.lang.Thread.run(Thread.java:744)




Build Log:
[...truncated 9167 lines...]
   [junit4] Suite: 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest
   [junit4]   1> Strategy: 
RecursivePrefixTreeStrategy(prefixGridScanLevel:-2,SPG:(QuadPrefixTree(maxLevels:2,ctx:SpatialContext{geo=fa

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-03-20 Thread rulinma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942720#comment-13942720
 ] 

rulinma commented on SOLR-1301:
---

mark.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 4.7, 5.0
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-03-20 Thread Da Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942684#comment-13942684
 ] 

Da Huang commented on LUCENE-4396:
--

A new iteration on the proposal has just been submitted. The new iteration has 
added a part "Supplementary Notes" to describe how to fit my design to the new 
design on the current lucene trunk, such as renaming BooleanScorer to 
BooleanBulkScorer, creating a new BooleanScorer extended from Scorer.

> BooleanScorer should sometimes be used for MUST clauses
> ---
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 100 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-20 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942635#comment-13942635
 ] 

Kranti Parisa commented on SOLR-4787:
-

so for any query you might return one or more EVENTS matching the title search 
terms + filters. 

say you have 30 events matching the given criteria but your pagination is 1-10, 
so you would be displaying the top 10 most relevant EVENTS.. this would be the 
docList of your first query.. and from the ResponseWriter you would need to 
make a call to TICKETS core, by using the original filters + the 10 event ids 
and execute that request (you might need to use LocalSolrQueryRequest and 
pre-processed filters etc to hit the caches of the first query). and collect 
the field info you need for each EVENT..

>From the joins implementation point of view, there is no such thing to fetch 
>the values or scores from the secondCore.. it would be very costly to do 
>that.. you would need to do write some custom ResponseWriters etc which does 
>this stuff.. especially considering your requirement of maintaing EVENTS and 
>TICKETS separately. There is also a new feature Collapse, Expand results.. but 
>then I am not sure about using them for your use case..



> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.8
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlu

Re: Analyzing primitive types, why can't we do this in Solr?

2014-03-20 Thread Alexandre Rafalovitch
Hi Erick,

Maybe work with me offline on that idea. Sounds interesting and I
would love to hear more details. There is more Solr popularization
stuff in works as well, so plenty of opportunities for - ugh -
synergies.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Mar 21, 2014 at 8:20 AM, Erick Erickson  wrote:
> Yeah, I finally got a little smart and clicked the hierarchy builder
> link in IntelliJ when resting on UpdateRequestProcessorFactory. I
> don't use the built-in IDE tools _nearly_ enough.
>
> That page looks great BTWl. I'm going to follow a parallel path with
> the Solr docs that I think will complement yours, just a brief outline
> of what's there similar to the "Analyzers and Tokenizers" page If
> I find the time... Siigggh.
>
> Erick
>
> On Thu, Mar 20, 2014 at 5:53 PM, Alexandre Rafalovitch
>  wrote:
>> That "chain" issue is exactly why I built the web page above. That,
>> plus the Javadoc links all over the place.
>>
>> Next, I am working on a similar page for all Solr Analyzers,
>> Tokenizers and Filters. Should be ready soon.
>>
>> Regards,
>>Alex.
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Fri, Mar 21, 2014 at 7:50 AM, Erick Erickson  
>> wrote:
>>> Thanks Alexandre! That's what I _thought_ I remembered!
>>>
>>> It looks like I found all the extends for UpdateProcessorFactory, but
>>> didn't follow the chain through FieldMutatingUpdateProcessorFactory
>>> which would have found that one for me.
>>>
>>> Siiihhh.
>>>
>>> Thanks again,
>>> Erick
>>>
>>> On Thu, Mar 20, 2014 at 5:44 PM, Alexandre Rafalovitch
>>>  wrote:
 Do you mean like:
 http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
 ?
 https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570

 Regards,
Alex.
 P.s. Quick URP lookup comes to you courtesy of:
 http://www.solr-start.com/update-request-processor/4.6.1/ :-)

 On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson  
 wrote:
> I suppose for the special case of dates we could create a
> DateFormatProcessorFactory that just took a list of standard Java
> SimpleDateFormat strings and applied the first one that fit.



 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Analyzing primitive types, why can't we do this in Solr?

2014-03-20 Thread Erick Erickson
Yeah, I finally got a little smart and clicked the hierarchy builder
link in IntelliJ when resting on UpdateRequestProcessorFactory. I
don't use the built-in IDE tools _nearly_ enough.

That page looks great BTWl. I'm going to follow a parallel path with
the Solr docs that I think will complement yours, just a brief outline
of what's there similar to the "Analyzers and Tokenizers" page If
I find the time... Siigggh.

Erick

On Thu, Mar 20, 2014 at 5:53 PM, Alexandre Rafalovitch
 wrote:
> That "chain" issue is exactly why I built the web page above. That,
> plus the Javadoc links all over the place.
>
> Next, I am working on a similar page for all Solr Analyzers,
> Tokenizers and Filters. Should be ready soon.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Fri, Mar 21, 2014 at 7:50 AM, Erick Erickson  
> wrote:
>> Thanks Alexandre! That's what I _thought_ I remembered!
>>
>> It looks like I found all the extends for UpdateProcessorFactory, but
>> didn't follow the chain through FieldMutatingUpdateProcessorFactory
>> which would have found that one for me.
>>
>> Siiihhh.
>>
>> Thanks again,
>> Erick
>>
>> On Thu, Mar 20, 2014 at 5:44 PM, Alexandre Rafalovitch
>>  wrote:
>>> Do you mean like:
>>> http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
>>> ?
>>> https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570
>>>
>>> Regards,
>>>Alex.
>>> P.s. Quick URP lookup comes to you courtesy of:
>>> http://www.solr-start.com/update-request-processor/4.6.1/ :-)
>>>
>>> On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson  
>>> wrote:
 I suppose for the special case of dates we could create a
 DateFormatProcessorFactory that just took a list of standard Java
 SimpleDateFormat strings and applied the first one that fit.
>>>
>>>
>>>
>>> Personal website: http://www.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all
>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>> book)
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4984) Fix ThaiWordFilter

2014-03-20 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4984.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.8

> Fix ThaiWordFilter
> --
>
> Key: LUCENE-4984
> URL: https://issues.apache.org/jira/browse/LUCENE-4984
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch
>
>
> ThaiWordFilter is an offender in TestRandomChains because it creates 
> positions and updates offsets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter

2014-03-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942617#comment-13942617
 ] 

ASF subversion and git services commented on LUCENE-4984:
-

Commit 1579855 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1579855 ]

LUCENE-4984: Fix ThaiWordFilter, smartcn WordTokenFilter

> Fix ThaiWordFilter
> --
>
> Key: LUCENE-4984
> URL: https://issues.apache.org/jira/browse/LUCENE-4984
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch
>
>
> ThaiWordFilter is an offender in TestRandomChains because it creates 
> positions and updates offsets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Analyzing primitive types, why can't we do this in Solr?

2014-03-20 Thread Alexandre Rafalovitch
That "chain" issue is exactly why I built the web page above. That,
plus the Javadoc links all over the place.

Next, I am working on a similar page for all Solr Analyzers,
Tokenizers and Filters. Should be ready soon.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Mar 21, 2014 at 7:50 AM, Erick Erickson  wrote:
> Thanks Alexandre! That's what I _thought_ I remembered!
>
> It looks like I found all the extends for UpdateProcessorFactory, but
> didn't follow the chain through FieldMutatingUpdateProcessorFactory
> which would have found that one for me.
>
> Siiihhh.
>
> Thanks again,
> Erick
>
> On Thu, Mar 20, 2014 at 5:44 PM, Alexandre Rafalovitch
>  wrote:
>> Do you mean like:
>> http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
>> ?
>> https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570
>>
>> Regards,
>>Alex.
>> P.s. Quick URP lookup comes to you courtesy of:
>> http://www.solr-start.com/update-request-processor/4.6.1/ :-)
>>
>> On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson  
>> wrote:
>>> I suppose for the special case of dates we could create a
>>> DateFormatProcessorFactory that just took a list of standard Java
>>> SimpleDateFormat strings and applied the first one that fit.
>>
>>
>>
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter

2014-03-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942599#comment-13942599
 ] 

ASF subversion and git services commented on LUCENE-4984:
-

Commit 1579853 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1579853 ]

LUCENE-4984: actually pass down the AttributeFactory to superclass

> Fix ThaiWordFilter
> --
>
> Key: LUCENE-4984
> URL: https://issues.apache.org/jira/browse/LUCENE-4984
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch
>
>
> ThaiWordFilter is an offender in TestRandomChains because it creates 
> positions and updates offsets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Google Summer of Code

2014-03-20 Thread Alexandre Rafalovitch
What does it take to be a mentor? I have a couple of Solr ideas I
would be happy to mentor someone on. But do mentors have to sign
agreements, be part of Apache formally, etc?

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Mar 20, 2014 at 10:12 PM, Michael McCandless
 wrote:
> Unfortunately, the only two GSoC mentors we seem to have this year is
> David Smiley and myself, and we each are already signed up to mentor
> one student, and there's at least two other students expressing
> interest in different issues.
>
> So it looks like we have too many students and too few mentors.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Mar 20, 2014 at 9:50 AM, Furkan KAMACI  wrote:
>> Hi;
>>
>> I want to apply for Google Summer of Code if I can catch up the deadline.
>> I've checked the issues. I want to ask that is there any issue which is
>> labeled for GSoC and has a volunteer mentor but nobody is applied? Because I
>> see that there are comments at some issues which asks about volunteer
>> mentors. If there is any issue I will be appreciated to work on it.
>>
>> Thanks;
>> Furkan KAMACI
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Analyzing primitive types, why can't we do this in Solr?

2014-03-20 Thread Erick Erickson
Thanks Alexandre! That's what I _thought_ I remembered!

It looks like I found all the extends for UpdateProcessorFactory, but
didn't follow the chain through FieldMutatingUpdateProcessorFactory
which would have found that one for me.

Siiihhh.

Thanks again,
Erick

On Thu, Mar 20, 2014 at 5:44 PM, Alexandre Rafalovitch
 wrote:
> Do you mean like:
> http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
> ?
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570
>
> Regards,
>Alex.
> P.s. Quick URP lookup comes to you courtesy of:
> http://www.solr-start.com/update-request-processor/4.6.1/ :-)
>
> On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson  
> wrote:
>> I suppose for the special case of dates we could create a
>> DateFormatProcessorFactory that just took a list of standard Java
>> SimpleDateFormat strings and applied the first one that fit.
>
>
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Analyzing primitive types, why can't we do this in Solr?

2014-03-20 Thread Alexandre Rafalovitch
Do you mean like:
http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
?
https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1570

Regards,
   Alex.
P.s. Quick URP lookup comes to you courtesy of:
http://www.solr-start.com/update-request-processor/4.6.1/ :-)

On Fri, Mar 21, 2014 at 2:35 AM, Erick Erickson  wrote:
> I suppose for the special case of dates we could create a
> DateFormatProcessorFactory that just took a list of standard Java
> SimpleDateFormat strings and applied the first one that fit.



Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1426 - Failure!

2014-03-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1426/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.component.DebugComponentTest

Error Message:
ERROR: SolrIndexSearcher opens=2 closes=3

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=2 closes=3
at __randomizedtesting.SeedInfo.seed([591D70B3D1E85AEF]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:420)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:179)
at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:789)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at java.lang.Thread.run(Thread.java:744)




Build Log:
[...truncated 10667 lines...]
   [junit4] Suite: org.apache.solr.handler.component.DebugComponentTest
   [junit4]   2> 1612742 T7076 oas.SolrTestCaseJ4.startTrackingSearchers WARN 
startTrackingSearchers: numOpens=5 numCloses=4
   [junit4]   2> 1612742 T7076 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl 
(true) and clientAuth (false)
   [junit4]   2> 1612743 T7076 oas.SolrTestCaseJ4.initCore initCore
   [junit4]   2> Creating dataDir: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/J0/./solrtest-DebugComponentTest-1395359725304
   [junit4]   2> 1612744 T7076 oasc.SolrResourceLoader. new 
SolrResourceLoader for directory: 
'/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/'
   [junit4]   2> 1612745 T7076 oasc.SolrResourceLoader.replaceClassLoader 
Adding 
'file:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/lib/.svn/'
 to classloader
   [junit4]   2> 1612746 T7076 oasc.SolrResourceLoader.replaceClassLoader 
Adding 
'file:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/lib/classes/'
 to classloader
   [junit4]   2> 1612746 T7076 oasc.SolrResourceLoader.replaceClassLoader 
Adding 
'file:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/lib/README'
 to classloader
   [junit4]   2> 1612800 T7076 oasc.SolrConfig. Using Lucene 
MatchVersion: LUCENE_50
   [junit4]   2> 1612819 T7076 oasc.SolrConfig. Loaded SolrConfig: 
solrconfig.xml
   [junit4]   2> 1612821 T7076 oass.IndexSchema.readSchema Reading Solr Schema 
from schema.xml
   [junit4]   2> 1612824 T7076 oass.IndexSchema.readSchema [null] Schema 
name=test
   [junit4]   2> 1612947 T7076 oass.OpenExchangeRatesOrgProvider.init 
Initialized with rates=open-exchange-rates.json, refreshInterval=1440.
   [junit4]   2> 1612951 T7076 oass.IndexSchema.readSchema default search field 
in schema is text
   [junit4]   2> 1612952 T7076 oass.IndexSchema.readSchema unique key field: id
   [junit4]   2> 1612958 T7076 oass.FileExchangeRateProvider.reload Reloading 
exchange rates from file currency.xml
   [junit4]   2> 1612964 T7076 oass.FileExchangeRateProvi

[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter

2014-03-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942582#comment-13942582
 ] 

ASF subversion and git services commented on LUCENE-4984:
-

Commit 1579846 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1579846 ]

LUCENE-4984: Fix ThaiWordFilter, smartcn WordTokenFilter

> Fix ThaiWordFilter
> --
>
> Key: LUCENE-4984
> URL: https://issues.apache.org/jira/browse/LUCENE-4984
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch
>
>
> ThaiWordFilter is an offender in TestRandomChains because it creates 
> positions and updates offsets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-03-20 Thread Da Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942577#comment-13942577
 ] 

Da Huang commented on LUCENE-4396:
--

I'm afraid that if BooleanBulkScorer also handle MUST, it couldn't make use of 
.advance(), as its subScorers are BulkScorer which could not call .advance().

> BooleanScorer should sometimes be used for MUST clauses
> ---
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 100 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5228) Don't require or be inside of -- or that be inside of

2014-03-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942559#comment-13942559
 ] 

Tomás Fernández Löbbe commented on SOLR-5228:
-

What about increasing the schema version? It is currently 1.5. Solr could 
continue supporting 1.5 as it is now with  and , create the 
version 1.6 that does not support those (and throws exception if present). 5.x 
would support 1.6+ versions, 4.x should support both but use 1.6 in the 
example. Anyone who needs to upgrade between 4.x versions can just keep their 
schema using 1.5. Anyone creating a new schema would start with the 1.6.

> Don't require  or  be inside of  -- or that 
>  be inside of 
> -
>
> Key: SOLR-5228
> URL: https://issues.apache.org/jira/browse/SOLR-5228
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> On the solr-user mailing list, Nutan recently mentioned spending days trying 
> to track down a problem that turned out to be because he had attempted to add 
> a {{}} that was outside of the {{}} block in his 
> schema.xml -- Solr was just silently ignoring it.
> We have made improvements in other areas of config validation by generating 
> statup errors when tags/attributes are found that are not expected -- but in 
> this case i think we should just stop expecting/requiring that the 
> {{}} and {{}} tags will be used to group these sorts of 
> things.  I think schema.xml parsing should just start ignoring them and only 
> care about finding the {{}}, {{}}, and {{}} 
> tags wherever they may be.
> If people want to keep using them, fine.  If people want to mix fieldTypes 
> and fields side by side (perhaps specify a fieldType, then list all the 
> fields using it) fine.  I don't see any value in forcing people to use them, 
> but we definitely shouldn't leave things the way they are with otherwise 
> perfectly valid field/type declarations being silently ignored.
> ---
> I'll take this on unless i see any objections.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5489) Add query rescoring API

2014-03-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942532#comment-13942532
 ] 

Robert Muir commented on LUCENE-5489:
-

This looks good, thanks for moving combine(), as the expression already 
indicates how to combine with the score. It would be cool for us to add that 
subclass in a followup issue, then we have a better feeling the abstractions 
are really working.

> Add query rescoring API
> ---
>
> Key: LUCENE-5489
> URL: https://issues.apache.org/jira/browse/LUCENE-5489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch, 
> LUCENE-5489.patch
>
>
> When costly scoring factors are used during searching, a common
> approach is to do a cheaper / basic query first, collect the top few
> hundred hits, and then rescore those hits using the more costly
> query.
> It's not clear/simple to do this with Lucene today; I think we should
> make it easier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5542) Explore making DVConsumer sparse-aware

2014-03-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942530#comment-13942530
 ] 

Robert Muir commented on LUCENE-5542:
-

The codec can already decide how to encode the values. Making the API more 
complicated doesn't seem to buy us anything. I'm open to a benchmark showing 
this, but I'm not seeing it.

> Explore making DVConsumer sparse-aware
> --
>
> Key: LUCENE-5542
> URL: https://issues.apache.org/jira/browse/LUCENE-5542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Shai Erera
>
> Today DVConsumer API requires the caller to pass a value for every document, 
> where {{null}} means "this doc has no value". The Codec can then choose how 
> to encode the values, i.e. whether it encodes a 0 for a numeric field, or 
> encodes the sparse docs. In practice, from what I see, we choose to encode 
> the 0s.
> I wonder if we e.g. added an {{Iterable}} to 
> DVConsumer.addXYZField(), if that would make a better API. The caller only 
> passes  pairs and it's up to the Codec to decide how it wants to 
> encode the missing values. Like, if a user's app truly has a sparse NDV, 
> IndexWriter doesn't need to "fill the gaps" artificially. It's the job of the 
> Codec.
> To be clear, I don't propose to change any Codec implementation in this issue 
> (w.r.t. sparse encoding - yes/no), only change the API to reflect that 
> sparseness. I think that if we'll ever want to encode sparse values, it will 
> be a more convenient API.
> Thoughts? I volunteer to do this work, but want to get others' opinion before 
> I start.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-03-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942481#comment-13942481
 ] 

Michael McCandless commented on LUCENE-4396:


Using BooleanScorer (a Scorer) when there is one or more MUST makes sense I 
think, but we need to test perf.  It could be letting BooleanBulkScorer also 
handle MUST gives a good performance gain, in which case we could let both 
handle it ...

bq. Besides, I'm afraid that the name of BulkScorer may be confusing. 

That's a good point ... for a while on that issue we had the name TopScorer ... 
maybe we need to revisit that :)


> BooleanScorer should sometimes be used for MUST clauses
> ---
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 100 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5489) Add query rescoring API

2014-03-20 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5489:
---

Attachment: LUCENE-5489.patch

New patch, folding in feedback ... I think it's ready.

> Add query rescoring API
> ---
>
> Key: LUCENE-5489
> URL: https://issues.apache.org/jira/browse/LUCENE-5489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch, 
> LUCENE-5489.patch
>
>
> When costly scoring factors are used during searching, a common
> approach is to do a cheaper / basic query first, collect the top few
> hundred hits, and then rescore those hits using the more costly
> query.
> It's not clear/simple to do this with Lucene today; I think we should
> make it easier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5543) Remove Directory.fileExists

2014-03-20 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5543:
---

Attachment: LUCENE-5543.patch

Patch, I think it's ready.

> Remove Directory.fileExists
> ---
>
> Key: LUCENE-5543
> URL: https://issues.apache.org/jira/browse/LUCENE-5543
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5543.patch
>
>
> Since 3.0.x/3.6.x (see LUCENE-5541), Lucene has substantially removed
> its reliance on fileExists to the point where I think we can fully
> remove it now.
> Like the other iffy IO methods we've removed over time (touchFile,
> fileModified, seeking back during write, ...), File.exists is
> dangerous because a low level IO issue can cause it to return false
> when it should have returned true.  The fewer IO operations we rely on
> the more reliable/portable Lucene is.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5543) Remove Directory.fileExists

2014-03-20 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5543:
--

 Summary: Remove Directory.fileExists
 Key: LUCENE-5543
 URL: https://issues.apache.org/jira/browse/LUCENE-5543
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.8, 5.0


Since 3.0.x/3.6.x (see LUCENE-5541), Lucene has substantially removed
its reliance on fileExists to the point where I think we can fully
remove it now.

Like the other iffy IO methods we've removed over time (touchFile,
fileModified, seeking back during write, ...), File.exists is
dangerous because a low level IO issue can cause it to return false
when it should have returned true.  The fewer IO operations we rely on
the more reliable/portable Lucene is.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter

2014-03-20 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942382#comment-13942382
 ] 

Simon Willnauer commented on LUCENE-4984:
-

I really like the base class! The patch LGTM +1 to commit

> Fix ThaiWordFilter
> --
>
> Key: LUCENE-4984
> URL: https://issues.apache.org/jira/browse/LUCENE-4984
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch
>
>
> ThaiWordFilter is an offender in TestRandomChains because it creates 
> positions and updates offsets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5489) Add query rescoring API

2014-03-20 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942377#comment-13942377
 ] 

Simon Willnauer commented on LUCENE-5489:
-

yeah it can wait I guess - please go ahead and put a TODO

> Add query rescoring API
> ---
>
> Key: LUCENE-5489
> URL: https://issues.apache.org/jira/browse/LUCENE-5489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch
>
>
> When costly scoring factors are used during searching, a common
> approach is to do a cheaper / basic query first, collect the top few
> hundred hits, and then rescore those hits using the more costly
> query.
> It's not clear/simple to do this with Lucene today; I think we should
> make it easier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5489) Add query rescoring API

2014-03-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942369#comment-13942369
 ] 

Michael McCandless commented on LUCENE-5489:


bq. I also think this method should only be on QueryRescorer and not in the 
interface?

Woops, right, I'll move it.

{quote}
I also wonder why you extract the IDs and Scores, I think you should clone the 
scoreDocs array and sort that first. Then you can just sort the rescored 
scoreDocs array and simply merge the scores. Once you are done you resort the 
previously cloned array and we don't need to do all the auto boxing in that 
hashmap and it's the same sorting we already do?
{quote}

I think this can wait?  It's just an optimization (making the code more hairy 
but a bit faster).  I'll put a TODO...

> Add query rescoring API
> ---
>
> Key: LUCENE-5489
> URL: https://issues.apache.org/jira/browse/LUCENE-5489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch
>
>
> When costly scoring factors are used during searching, a common
> approach is to do a cheaper / basic query first, collect the top few
> hundred hits, and then rescore those hits using the more costly
> query.
> It's not clear/simple to do this with Lucene today; I think we should
> make it easier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5489) Add query rescoring API

2014-03-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942364#comment-13942364
 ] 

Michael McCandless commented on LUCENE-5489:


bq. I guess we should really just pass a boolean to make thinks clear.

I'll switch to a boolean; I agree the sig is weird now.

> Add query rescoring API
> ---
>
> Key: LUCENE-5489
> URL: https://issues.apache.org/jira/browse/LUCENE-5489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch
>
>
> When costly scoring factors are used during searching, a common
> approach is to do a cheaper / basic query first, collect the top few
> hundred hits, and then rescore those hits using the more costly
> query.
> It's not clear/simple to do this with Lucene today; I think we should
> make it easier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-03-20 Thread Gregory Chanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated SOLR-5865:
-

Attachment: SOLR-5865addendum2.patch

Here's a patch that depends on LuceneTestCase.  As I mentioned above, I haven't 
run the whole suite with this.

> Provide a MiniSolrCloudCluster to enable easier testing
> ---
>
> Key: SOLR-5865
> URL: https://issues.apache.org/jira/browse/SOLR-5865
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.7, 5.0
>Reporter: Gregory Chanan
>Assignee: Mark Miller
> Attachments: SOLR-5865.patch, SOLR-5865.patch, 
> SOLR-5865addendum.patch, SOLR-5865addendum2.patch
>
>
> Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
> which has a couple of issues around support for downstream projects:
> - It's difficult to test SolrCloud support in a downstream project that may 
> have its own test framework.  For example, some projects have support for 
> different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
> against each of the different backends.  This is difficult to do cleanly, 
> because the Solr tests require derivation from LuceneTestCase, while the 
> other don't
> - The LuceneTestCase class hierarchy is really designed for internal solr 
> tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
> downstream project probably doesn't care about that).  It's also quite 
> complicated and dense, much more so than a downstream project would want.
> Given these reasons, it would be nice to provide a simple 
> "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or 
> HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-03-20 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942330#comment-13942330
 ] 

Gregory Chanan commented on SOLR-5865:
--

Hmm, at this point may make more sense to not try to get the test to work 
outside of the test hierarchy completely.  We could try to recreate the minimum 
set of what we need there (SystemPropertiesRestoreRules and ThreadLeakScopes) 
but that may change in the test hierarchy itself, requiring just this test to 
update.  The important thing, I think, is that we don't require the complete 
SolrCloud test hierarchy, e.g. AbstractFullDistribZkTestBase and the like.

The question then, is whether we rely on LuceneTestCase or SolrTestCaseJ4.  
LuceneTestCase is arguably better, because we know we don't rely on anything 
solr-specific for the test, although the downside is we may have to update it 
to keep in sync with the SolrTestCaseJ4.  I don't have a strong preference 
either way.

I messed around with that a little bit and I have a patch that seems to work 
with just LuceneTestCase -- I had to import the a couple of rules from 
SolrTestCaseJ4, but not much.  I haven't run the full suite though, so I'm not 
100% sure it's kosher.



> Provide a MiniSolrCloudCluster to enable easier testing
> ---
>
> Key: SOLR-5865
> URL: https://issues.apache.org/jira/browse/SOLR-5865
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.7, 5.0
>Reporter: Gregory Chanan
>Assignee: Mark Miller
> Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch
>
>
> Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
> which has a couple of issues around support for downstream projects:
> - It's difficult to test SolrCloud support in a downstream project that may 
> have its own test framework.  For example, some projects have support for 
> different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
> against each of the different backends.  This is difficult to do cleanly, 
> because the Solr tests require derivation from LuceneTestCase, while the 
> other don't
> - The LuceneTestCase class hierarchy is really designed for internal solr 
> tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
> downstream project probably doesn't care about that).  It's also quite 
> complicated and dense, much more so than a downstream project would want.
> Given these reasons, it would be nice to provide a simple 
> "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or 
> HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5542) Explore making DVConsumer sparse-aware

2014-03-20 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-5542:
--

 Summary: Explore making DVConsumer sparse-aware
 Key: LUCENE-5542
 URL: https://issues.apache.org/jira/browse/LUCENE-5542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Shai Erera


Today DVConsumer API requires the caller to pass a value for every document, 
where {{null}} means "this doc has no value". The Codec can then choose how to 
encode the values, i.e. whether it encodes a 0 for a numeric field, or encodes 
the sparse docs. In practice, from what I see, we choose to encode the 0s.

I wonder if we e.g. added an {{Iterable}} to DVConsumer.addXYZField(), 
if that would make a better API. The caller only passes  pairs and 
it's up to the Codec to decide how it wants to encode the missing values. Like, 
if a user's app truly has a sparse NDV, IndexWriter doesn't need to "fill the 
gaps" artificially. It's the job of the Codec.

To be clear, I don't propose to change any Codec implementation in this issue 
(w.r.t. sparse encoding - yes/no), only change the API to reflect that 
sparseness. I think that if we'll ever want to encode sparse values, it will be 
a more convenient API.

Thoughts? I volunteer to do this work, but want to get others' opinion before I 
start.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5489) Add query rescoring API

2014-03-20 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942278#comment-13942278
 ] 

Simon Willnauer commented on LUCENE-5489:
-

oh I see the Float was to mark a match / non-match.. I guess we should really 
just pass a boolean to make thinks clear. 

> Add query rescoring API
> ---
>
> Key: LUCENE-5489
> URL: https://issues.apache.org/jira/browse/LUCENE-5489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch
>
>
> When costly scoring factors are used during searching, a common
> approach is to do a cheaper / basic query first, collect the top few
> hundred hits, and then rescore those hits using the more costly
> query.
> It's not clear/simple to do this with Lucene today; I think we should
> make it easier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5489) Add query rescoring API

2014-03-20 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942272#comment-13942272
 ] 

Simon Willnauer commented on LUCENE-5489:
-

hey mike, thanks for the new patch I think you overlooked that one but the 
signature looks funky:
{code}
protected abstract float combine(float firstPassScore, Float secondPassScore);
{code}

I guess we can use the primitive for both args? I also think this method should 
only be on QueryRescorer and not in the interface?

I also wonder why you extract the IDs and Scores, I think you should clone the 
scoreDocs array and sort that first. Then you can just sort the rescored 
scoreDocs array and simply merge the scores. Once you are done you resort the 
previously cloned array and we don't need to do all the auto boxing in that 
hashmap and it's the same sorting we already do?

> Add query rescoring API
> ---
>
> Key: LUCENE-5489
> URL: https://issues.apache.org/jira/browse/LUCENE-5489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5489.patch, LUCENE-5489.patch, LUCENE-5489.patch
>
>
> When costly scoring factors are used during searching, a common
> approach is to do a cheaper / basic query first, collect the top few
> hundred hits, and then rescore those hits using the more costly
> query.
> It's not clear/simple to do this with Lucene today; I think we should
> make it easier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5892) Document asynchronous OCP and CoreAdmin calls

2014-03-20 Thread Anshum Gupta (JIRA)
Anshum Gupta created SOLR-5892:
--

 Summary: Document asynchronous OCP and CoreAdmin calls
 Key: SOLR-5892
 URL: https://issues.apache.org/jira/browse/SOLR-5892
 Project: Solr
  Issue Type: Task
  Components: documentation
Reporter: Anshum Gupta
Assignee: Anshum Gupta


Document the feature committed via SOLR-5477.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Analyzing primitive types, why can't we do this in Solr?

2014-03-20 Thread Erick Erickson
And one of my co-workers reminded me of what can be used in Solr right
now to accomplish this, so the problem is moot.

RegexReplaceProcessorFactory

Gets inserted into Solr's update process before it gets fed to the
field, so any transformations one wants to do can be done there.

I guess the only thing I can't do would be to feed in multiple values
to the primitive type field, but that's no big deal

I suppose for the special case of dates we could create a
DateFormatProcessorFactory that just took a list of standard Java
SimpleDateFormat strings and applied the first one that fit.


On Thu, Mar 20, 2014 at 11:57 AM, Erick Erickson
 wrote:
> Uwe:
>
> Thanks! I peeked at the code briefly and I see that it would
> be hard.
>
> Figured there was a good reason.
>
> What about a higher-level approach? I'm thinking a thin
> wrapper for Solr that would apply the analysis chains and feed
> the results into the native Lucene primitive processing. Seems
> kind of kludgy, I'm mostly wondering if it's conceptually
> possible/reasonable.
>
> Frankly, I'm not convinced there's enough all for something like
> this to justify the work/complexification though.
>
> Erick
>
> On Thu, Mar 20, 2014 at 11:17 AM, Uwe Schindler  wrote:
>> Hi Erick,
>>
>> The numerics are in fact "analyzed". The data is read using a Tokenizer that 
>> works on top of oal.analysis.NumericTokenStream from Lucene. This one 
>> produces the tokens from the numerical value given as native data type to 
>> the TokenStream. Those are indexed (in fact, it is binary data in different 
>> precisions according to the precision step).
>> Additional analysis on top of that is not easy possible, because the 
>> Tokenizer does all the work, there is no way to inject a TokenFilter. 
>> Theoretically, there would only be the possibility to add a CharFilter 
>> before the numeric tokenizer. But the field type does not allow to do that 
>> at the moment, because the "analysis" is hardcoded in the field type.
>>
>>
>> Uwe
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>>> -Original Message-
>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>>> Sent: Thursday, March 20, 2014 6:52 PM
>>> To: dev@lucene.apache.org
>>> Subject: Analyzing primitive types, why can't we do this in Solr?
>>>
>>> It's bugged me for a while that we can't define any analysis on primitive
>>> types. This is especially acute with date types, we require a very exact 
>>> format
>>> and have to tell people "transform it correctly on the ingestion side", or
>>> "create an custom update processor that transforms it".
>>>
>>> I thought I remembered something about being able to do this, but can't find
>>> it. I suspect I was confusing it with DIH.
>>>
>>> What's the reason for primitive types being unanalyzed? Just "it's always
>>> been that way", or "it would lead to a very sticky wicket we never wanted to
>>> get stuck in"? Both are perfectly valid, I'm just sayin'.
>>>
>>> I realize this would provide some "interesting" output. Say you defined a
>>> regex for an int type that removed all non-numerics. If the input was
>>> "30asdf" and it was transformed correctly into 30 for the underlying int 
>>> field,
>>> it would still come back as 30asdf from the stored data, but that's true 
>>> about
>>> all analysis steps.
>>>
>>> Or perhaps you'd like to have a string of integers as input to a 
>>> multiValued int
>>> field. Or
>>>
>>> Musings sparked by seeing this crop up again in another context.
>>>
>>> Erick
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>>> commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5541) FileExistsCachingDirectory, to work around unreliable File.exists

2014-03-20 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5541:
---

Attachment: LUCENE-5541.patch

Patch with two classes:

  * FileExistsCachingDirectory, to work around File.exists unreliability

  * FixCFS to re-insert a missing CFS sub-file if you hit this corruption

> FileExistsCachingDirectory, to work around unreliable File.exists
> -
>
> Key: LUCENE-5541
> URL: https://issues.apache.org/jira/browse/LUCENE-5541
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Reporter: Michael McCandless
> Attachments: LUCENE-5541.patch
>
>
> File.exists is a dangerous method in Java, because if there is a
> low-level IOException (permission denied, out of file handles, etc.)
> the method can return false when it should return true.
> Fortunately, as of Lucene 4.x, we rely much less on File.exists,
> because we track which files the codec components created, and we know
> those files then exist.
> But, unfortunately, going from 3.0.x to 3.6.x, we increased our
> reliance on File.exists, e.g. when creating CFS we check File.exists
> on each sub-file before trying to add it, and I have a customer
> corruption case where apparently a transient low level IOE caused
> File.exists to incorrectly return false for one of the sub-files.  It
> results in corruption like this:
> {noformat}
>   java.io.FileNotFoundException: No sub-file with id .fnm found 
> (fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx])
>   
> org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157)
>   
> org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146)
>   org.apache.lucene.index.FieldInfos.(FieldInfos.java:71)
>   org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212)
>   
> org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228)
>   org.apache.lucene.index.IndexWriter.(IndexWriter.java:1161)
> {noformat}
> I think typically local file systems don't often hit such low level
> errors, but if you have an index on a remote filesystem, where network
> hiccups can cause problems, it's more likely.
> As a simple workaround, I created a basic Directory delegator that
> holds a Set of all created but not deleted files, and short-circuits
> fileExists to return true if the file is in that set.
> I don't plan to commit this: we aren't doing bug-fix releases on
> 3.6.x anymore (it's very old by now), and this problem is already
> "fixed" in 4.x (by reducing our reliance on File.exists), but I wanted
> to post the code here in case others hit it.  It looks like it was hit
> e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and
> https://issues.jboss.org/browse/ISPN-2981 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4890) QueryTreeBuilder.getBuilder() only finds interfaces on the most derived class

2014-03-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942162#comment-13942162
 ] 

ASF subversion and git services commented on LUCENE-4890:
-

Commit 1579717 from [~mikemccand] in branch 'dev/branches/lucene_solr_3_6'
[ https://svn.apache.org/r1579717 ]

LUCENE-4890: get this test passing again

> QueryTreeBuilder.getBuilder() only finds interfaces on the most derived class
> -
>
> Key: LUCENE-4890
> URL: https://issues.apache.org/jira/browse/LUCENE-4890
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 2.9, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 3.0, 3.0.1, 3.0.2, 
> 3.0.3, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.6.1, 3.6.2
> Environment: Lucene 3.3.0 on Win32
>Reporter: Philip Searle
>Assignee: Adriano Crestani
>Priority: Minor
> Fix For: 3.6.3, 4.4, 5.0
>
> Attachments: LUCENE-4890_2013_05_25.patch
>
>
> QueryBuilder implementations registered with QueryTreeBuilder.setBuilder() 
> are not recognized by QueryTreeBuilder.getBuilder() if they are registered 
> for an interface implemented by a superclass. Registering them for a concrete 
> query node class or an interface implemented by the most-derived class do 
> work.
> {code:title=example.java|borderStyle=solid}
> /* Our custom query builder */
> class CustomQueryTreeBuilder extends QueryTreeBuilder {
>   public CustomQueryTreeBuilder() {
> /* Turn field:"value" into an application-specific object */
> setBuilder(FieldQueryNode.class, new QueryBuilder() {
>   @Override
>   public Object build(QueryNode queryNode) {
> FieldQueryNode node = (FieldQueryNode) queryNode;
> return new ApplicationSpecificClass(node.getFieldAsString());
>   }
> });
> /* Ignore all other query node types */
> setBuilder(QueryNode.class, new  QueryBuilder() {
>   @Override
>   public Object build(QueryNode queryNode) {
> return null;
>   }
> });
>   }
> }
> /* Assume this is in the main program: */
> StandardQueryParser queryParser = new StandardQueryParser();
> queryParser.setQueryBuilder(new CustomQueryTreeBuilder());
> /* The following line will throw an exception because it can't find a builder 
> for BooleanQueryNode.class */
> Object queryObject = queryParser.parse("field:\"value\" field2:\"value2\"", 
> "field");
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Analyzing primitive types, why can't we do this in Solr?

2014-03-20 Thread Erick Erickson
Uwe:

Thanks! I peeked at the code briefly and I see that it would
be hard.

Figured there was a good reason.

What about a higher-level approach? I'm thinking a thin
wrapper for Solr that would apply the analysis chains and feed
the results into the native Lucene primitive processing. Seems
kind of kludgy, I'm mostly wondering if it's conceptually
possible/reasonable.

Frankly, I'm not convinced there's enough all for something like
this to justify the work/complexification though.

Erick

On Thu, Mar 20, 2014 at 11:17 AM, Uwe Schindler  wrote:
> Hi Erick,
>
> The numerics are in fact "analyzed". The data is read using a Tokenizer that 
> works on top of oal.analysis.NumericTokenStream from Lucene. This one 
> produces the tokens from the numerical value given as native data type to the 
> TokenStream. Those are indexed (in fact, it is binary data in different 
> precisions according to the precision step).
> Additional analysis on top of that is not easy possible, because the 
> Tokenizer does all the work, there is no way to inject a TokenFilter. 
> Theoretically, there would only be the possibility to add a CharFilter before 
> the numeric tokenizer. But the field type does not allow to do that at the 
> moment, because the "analysis" is hardcoded in the field type.
>
>
> Uwe
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Thursday, March 20, 2014 6:52 PM
>> To: dev@lucene.apache.org
>> Subject: Analyzing primitive types, why can't we do this in Solr?
>>
>> It's bugged me for a while that we can't define any analysis on primitive
>> types. This is especially acute with date types, we require a very exact 
>> format
>> and have to tell people "transform it correctly on the ingestion side", or
>> "create an custom update processor that transforms it".
>>
>> I thought I remembered something about being able to do this, but can't find
>> it. I suspect I was confusing it with DIH.
>>
>> What's the reason for primitive types being unanalyzed? Just "it's always
>> been that way", or "it would lead to a very sticky wicket we never wanted to
>> get stuck in"? Both are perfectly valid, I'm just sayin'.
>>
>> I realize this would provide some "interesting" output. Say you defined a
>> regex for an int type that removed all non-numerics. If the input was
>> "30asdf" and it was transformed correctly into 30 for the underlying int 
>> field,
>> it would still come back as 30asdf from the stored data, but that's true 
>> about
>> all analysis steps.
>>
>> Or perhaps you'd like to have a string of integers as input to a multiValued 
>> int
>> field. Or
>>
>> Musings sparked by seeing this crop up again in another context.
>>
>> Erick
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5891) Problems installing Apache Solr with Apache Tomcat

2014-03-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe resolved SOLR-5891.
-

Resolution: Invalid

Looks like a configuration issue. You are trying to set the solr home to 
/opt/solr but apparently Solr is reading from /root/solr-4.7.0/example/solr 
(maybe you set that somewhere else?).
Anyway, you should raise this question int the users list: 
https://lucene.apache.org/solr/discussion.html

> Problems installing Apache Solr with Apache Tomcat
> --
>
> Key: SOLR-5891
> URL: https://issues.apache.org/jira/browse/SOLR-5891
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, SolrCloud
>Affects Versions: 4.7
> Environment: Centos 6.5 Installing Apache Tomcat 7.0.52 with Apache 
> Solr 4.7.0
>Reporter: Dean Zambrano
>  Labels: build, newbie
> Fix For: 4.7
>
>   Original Estimate: 26h
>  Remaining Estimate: 26h
>
> I installed Apache Solr, version 4.7.0. As part of the install, I performed 
> the following: 
> (Based on these instructions: 
> http://sachkadam.wordpress.com/2013/04/29/solr-installation-on-centos-6/) -> 
> The summary is as follows:
> -  I moved the "/example/solr" directory to /opt/solr. 
> -  I created a "solr.xml" file with contains the following xml code:
> # more solr.xml
> 
> 
>  override=”true”/>
> 
> The "solr.xml" file is located in: 
> /usr/share/apache-tomcat-7.0.52/conf/Catalina/localhost
> **When I try to access solr through the following URL: 
> http://107.170.94.202:8983, I receive the following error: 
> {msg=SolrCore 'collection1' is not available due to init failure: Could not 
> load config file 
> /root/solr-4.7.0/example/solr/collection1/solrconfig.xml,trace=org.apache.solr.common.SolrException:
>  SolrCore 'collection1' is not available due to init failure: Could not load 
> config file /root/solr-4.7.0/example/solr/collection1/solrconfig.xml
>   at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:317)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>   at org.eclipse.jetty.server.Server.handle(Server.java:368)
>   at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>   at 
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>   at 
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>   at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>   at 
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>   at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.solr.common.SolrException: Could not load config file 
> /root/solr-4.7.0/example/solr/collection1/solrconfig.xml
>   at 
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:530)
>   at org.apache.s

[jira] [Commented] (SOLR-5890) Delete silently fails if not sent to shard where document was added

2014-03-20 Thread Brett Hoerner (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942114#comment-13942114
 ] 

Brett Hoerner commented on SOLR-5890:
-

I believe I have the same issue (using implicit also). Is there any way for me 
as the user to send the equivalent of "_route_" with a delete by ID? I have 
enough information to target the right shard, I'm just not sure how to "tell" 
it that.

> Delete silently fails if not sent to shard where document was added
> ---
>
> Key: SOLR-5890
> URL: https://issues.apache.org/jira/browse/SOLR-5890
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.7
> Environment: Debian 7.4.
>Reporter: Peter Inglesby
> Fix For: 4.8, 5.0, 4.7.1
>
>
> We have SolrCloud set up with two shards, each with a leader and a replica.  
> We use haproxy to distribute requests between the four nodes.
> Regardless of which node we send an add request to, following a commit, the 
> newly-added document is returned in a search, as expected.
> However, we can only delete a document if the delete request is sent to a 
> node in the shard where the document was added.  If we send the delete 
> request to a node in the other shard (and then send a commit) the document is 
> not deleted.  Such a delete request will get a 200 response, with the 
> following body:
>   {'responseHeader'=>{'status'=>0,'QTime'=>7}}
> Apart from the the very low QTime, this is indistinguishable from a 
> successful delete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-20 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942112#comment-13942112
 ] 

Shawn Heisey commented on SOLR-5852:


The inclusion of my patch should not be taken as an endorsement on this issue.  
I had these ideas floating around in my head that wanted to be put into actual 
code, so I acquiesced and wrote it.

I don't believe we need this at all.  If there's consensus that disagrees, then 
I think it requires the robustness that I put into my patch.

> Add CloudSolrServer helper method to connect to a ZK ensemble
> -
>
> Key: SOLR-5852
> URL: https://issues.apache.org/jira/browse/SOLR-5852
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
> Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, 
> SOLR-5852_FK.patch, SOLR-5852_FK.patch
>
>
> We should have a CloudSolrServer constructor which takes a list of ZK servers 
> to connect to.
> Something Like 
> {noformat}
> public CloudSolrServer(String... zkHost);
> {noformat}
> - Document the current constructor better to mention that to connect to a ZK 
> ensemble you can pass a comma-delimited list of ZK servers like 
> zk1:2181,zk2:2181,zk3:2181
> - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2014-03-20 Thread Andrew Buchanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942091#comment-13942091
 ] 

Andrew Buchanan commented on SOLR-2649:
---

Ping for Jan Høydahl to review

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Priority: Minor
> Fix For: 4.8
>
> Attachments: SOLR-2649.diff
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Analyzing primitive types, why can't we do this in Solr?

2014-03-20 Thread Uwe Schindler
Hi Erick,

The numerics are in fact "analyzed". The data is read using a Tokenizer that 
works on top of oal.analysis.NumericTokenStream from Lucene. This one produces 
the tokens from the numerical value given as native data type to the 
TokenStream. Those are indexed (in fact, it is binary data in different 
precisions according to the precision step).
Additional analysis on top of that is not easy possible, because the Tokenizer 
does all the work, there is no way to inject a TokenFilter. Theoretically, 
there would only be the possibility to add a CharFilter before the numeric 
tokenizer. But the field type does not allow to do that at the moment, because 
the "analysis" is hardcoded in the field type.


Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, March 20, 2014 6:52 PM
> To: dev@lucene.apache.org
> Subject: Analyzing primitive types, why can't we do this in Solr?
> 
> It's bugged me for a while that we can't define any analysis on primitive
> types. This is especially acute with date types, we require a very exact 
> format
> and have to tell people "transform it correctly on the ingestion side", or
> "create an custom update processor that transforms it".
> 
> I thought I remembered something about being able to do this, but can't find
> it. I suspect I was confusing it with DIH.
> 
> What's the reason for primitive types being unanalyzed? Just "it's always
> been that way", or "it would lead to a very sticky wicket we never wanted to
> get stuck in"? Both are perfectly valid, I'm just sayin'.
> 
> I realize this would provide some "interesting" output. Say you defined a
> regex for an int type that removed all non-numerics. If the input was
> "30asdf" and it was transformed correctly into 30 for the underlying int 
> field,
> it would still come back as 30asdf from the stored data, but that's true about
> all analysis steps.
> 
> Or perhaps you'd like to have a string of integers as input to a multiValued 
> int
> field. Or
> 
> Musings sparked by seeing this crop up again in another context.
> 
> Erick
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5860) Logging around core wait for state during startup / recovery is confusing

2014-03-20 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942076#comment-13942076
 ] 

Shalin Shekhar Mangar commented on SOLR-5860:
-

I'm seeing some test failures with the patch. Ran it twice already. I have to 
call it a day but if nobody else gets to it first, I'll debug tomorrow and 
commit.

{quote}
   [junit4] Tests with failures:
   [junit4]   - 
org.apache.solr.handler.component.TermVectorComponentDistributedTest.testDistribSearch
   [junit4]   - 
org.apache.solr.handler.component.DistributedExpandComponentTest.testDistribSearch
   [junit4]   - 
org.apache.solr.handler.component.DistributedSuggestComponentTest.testDistribSearch
   [junit4]   - org.apache.solr.TestDistributedGrouping.testDistribSearch
   [junit4]   - 
org.apache.solr.handler.component.DistributedTermsComponentTest.testDistribSearch
   [junit4]   - 
org.apache.solr.handler.component.DistributedSpellCheckComponentTest.testDistribSearch
   [junit4]   - org.apache.solr.TestDistributedMissingSort.testDistribSearch
   [junit4]   - org.apache.solr.TestDistributedSearch.testDistribSearch
   [junit4]   - 
org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.testDistribSearch
   [junit4] 
{quote}

> Logging around core wait for state during startup / recovery is confusing
> -
>
> Key: SOLR-5860
> URL: https://issues.apache.org/jira/browse/SOLR-5860
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Timothy Potter
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Attachments: SOLR-5860.patch
>
>
> I'm seeing some log messages like this:
> I was asked to wait on state recovering for HOST:8984_solr but I still do not 
> see the requested state. I see state: recovering live:true
> This is very confusing because from the log, it seems like it's waiting to 
> see the state it's in ... After digging through the code, it appears that it 
> is really waiting for a leader to become active so that it has a leader to 
> recover from.
> I'd like to improve the logging around this critical wait loop to give better 
> context to what is happening. 
> Also, I would like to change the following so that we force state updates 
> every 15 seconds for the entire wait period.
> -  if (retry == 15 || retry == 60) {
> +  if (retry % 15 == 0) {
> As-is, it's waiting 120 seconds but only forcing the state to update twice, 
> once after 15 seconds and again after 60 … might be good to force updates for 
> the full wait period.
> Lastly, I think it would be good to use the leaderConflictResolveWait setting 
> (from ZkController) here as well since 120 may not be enough for a leader to 
> become active in a busy cluster, esp. after the node the Overseer is running 
> on. Maybe leaderConflictResolveWait + 5 seconds?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_51) - Build # 9857 - Still Failing!

2014-03-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9857/
Java: 64bit/jdk1.7.0_51 -XX:+UseCompressedOops -XX:+UseSerialGC 
-XX:-UseSuperWord

1 tests failed.
FAILED:  
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin 
{#8 seed=[B2C8C580BA165153:878CD25E862E637E]}

Error Message:
Shouldn't match I#4:Rect(minX=139.0,maxX=148.0,minY=119.0,maxY=121.0) 
Q:Pt(x=128.0,y=117.0)

Stack Trace:
java.lang.AssertionError: Shouldn't match 
I#4:Rect(minX=139.0,maxX=148.0,minY=119.0,maxY=121.0) Q:Pt(x=128.0,y=117.0)
at 
__randomizedtesting.SeedInfo.seed([B2C8C580BA165153:878CD25E862E637E]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:355)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:335)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin(SpatialOpRecursivePrefixTreeTest.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at java.lang.Thread.run(Thread.java:744)




Build Log:
[...truncated 9084 lines...]
   [junit4] Suite: 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest
   [junit4]   1> Strategy: 
RecursiveP

[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-20 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942057#comment-13942057
 ] 

Shawn Heisey commented on SOLR-5852:


bq. As I read this, I don't quite see the utility of offering all the different 
ways of specifying the ensemble.
bq. Aren't these all handled by the "typical" ZK ensemble connection string?

I actually agree.  But if a method like this is created, that's how I would 
want to do it.


> Add CloudSolrServer helper method to connect to a ZK ensemble
> -
>
> Key: SOLR-5852
> URL: https://issues.apache.org/jira/browse/SOLR-5852
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
> Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, 
> SOLR-5852_FK.patch, SOLR-5852_FK.patch
>
>
> We should have a CloudSolrServer constructor which takes a list of ZK servers 
> to connect to.
> Something Like 
> {noformat}
> public CloudSolrServer(String... zkHost);
> {noformat}
> - Document the current constructor better to mention that to connect to a ZK 
> ensemble you can pass a comma-delimited list of ZK servers like 
> zk1:2181,zk2:2181,zk3:2181
> - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-20 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942046#comment-13942046
 ] 

Furkan KAMACI commented on SOLR-5852:
-

[~erickerickson] could you check my comments and my patch?

> Add CloudSolrServer helper method to connect to a ZK ensemble
> -
>
> Key: SOLR-5852
> URL: https://issues.apache.org/jira/browse/SOLR-5852
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
> Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, 
> SOLR-5852_FK.patch, SOLR-5852_FK.patch
>
>
> We should have a CloudSolrServer constructor which takes a list of ZK servers 
> to connect to.
> Something Like 
> {noformat}
> public CloudSolrServer(String... zkHost);
> {noformat}
> - Document the current constructor better to mention that to connect to a ZK 
> ensemble you can pass a comma-delimited list of ZK servers like 
> zk1:2181,zk2:2181,zk3:2181
> - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Analyzing primitive types, why can't we do this in Solr?

2014-03-20 Thread Erick Erickson
It's bugged me for a while that we can't define any analysis on
primitive types. This is especially acute with date types, we require
a very exact format and have to tell people "transform it correctly on
the ingestion side", or "create an custom update processor that
transforms it".

I thought I remembered something about being able to do this, but
can't find it. I suspect I was confusing it with DIH.

What's the reason for primitive types being unanalyzed? Just "it's
always been that way", or "it would lead to a very sticky wicket we
never wanted to get stuck in"? Both are perfectly valid, I'm just
sayin'.

I realize this would provide some "interesting" output. Say you
defined a regex for an int type that removed all non-numerics. If the
input was "30asdf" and it was transformed correctly into 30 for the
underlying int field, it would still come back as 30asdf from the
stored data, but that's true about all analysis steps.

Or perhaps you'd like to have a string of integers as input to a
multiValued int field. Or

Musings sparked by seeing this crop up again in another context.

Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5888) SyncSliceTest is slower than it should be.

2014-03-20 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-5888.
---

Resolution: Fixed

> SyncSliceTest is slower than it should be.
> --
>
> Key: SOLR-5888
> URL: https://issues.apache.org/jira/browse/SOLR-5888
> Project: Solr
>  Issue Type: Test
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.8, 5.0
>
>
> This test is surprisingly slow. Turns out, it's waiting around in many cases 
> when it does not necessarily need to.
> Part of the fix should speed up some other tests a bit as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-03-20 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942015#comment-13942015
 ] 

Mark Miller commented on SOLR-5865:
---

Hmm - nope, I can still see something. It's just becoming rarer. I think this 
one is from threads leaking past the end of the test. The test framework has a 
linger for this and attempted interrupts (linger has shown to be especially 
important for zk tests, which has threads that can seem to linger for a while 
after shutdown).

> Provide a MiniSolrCloudCluster to enable easier testing
> ---
>
> Key: SOLR-5865
> URL: https://issues.apache.org/jira/browse/SOLR-5865
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.7, 5.0
>Reporter: Gregory Chanan
>Assignee: Mark Miller
> Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch
>
>
> Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
> which has a couple of issues around support for downstream projects:
> - It's difficult to test SolrCloud support in a downstream project that may 
> have its own test framework.  For example, some projects have support for 
> different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
> against each of the different backends.  This is difficult to do cleanly, 
> because the Solr tests require derivation from LuceneTestCase, while the 
> other don't
> - The LuceneTestCase class hierarchy is really designed for internal solr 
> tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
> downstream project probably doesn't care about that).  It's also quite 
> complicated and dense, much more so than a downstream project would want.
> Given these reasons, it would be nice to provide a simple 
> "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or 
> HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-20 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941974#comment-13941974
 ] 

Erick Erickson commented on SOLR-5852:
--

As I read this, I don't quite see the utility of offering all the different ways
of specifying the ensemble.

1> ("host1:2181", "/mychroot")
2> ("127.0.0.1:3000", "127.0.0.1:3001", "127.0.0.1:3002")
3> ("localhost:2181", "localhost:2181", "localhost:2181/solrtwo")
4> ("zoo1:2181", "zoo2:2181", "zoo3:2181", "/solr-three")
5> ("zoo1.example.com:2181", 
"zoo2.example.com:2181","zoo3.example/com:2181","/solr-three")
6> ("zoo1:2181/root", "zoo2:2181/root", "zoo3:2181/root")

Aren't these all handled by the "typical" ZK ensemble connection string? I.e.
1> "host1:2101/mychroot"
2> "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002"
3> "localhost:2181,localhost:2181,localhost:2181/solrtwo"
4> like 3
5> like 3
6> like 3

I confess I'm just looking at it from a rather ignorant level, but it seems
like this would add complexity for no added functionality. Of course I may be 
missing a lot, if there are places where this kind of processing is
_already_ being done and this moves things to a c'tor that would
be a reason.

I'd rather have a single form than multiple forms, unless the multiple
forms give me added functionality. Otherwise, one adds maintenance
without adding value..

Let me know if I've missed the boat here.



> Add CloudSolrServer helper method to connect to a ZK ensemble
> -
>
> Key: SOLR-5852
> URL: https://issues.apache.org/jira/browse/SOLR-5852
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
> Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, 
> SOLR-5852_FK.patch, SOLR-5852_FK.patch
>
>
> We should have a CloudSolrServer constructor which takes a list of ZK servers 
> to connect to.
> Something Like 
> {noformat}
> public CloudSolrServer(String... zkHost);
> {noformat}
> - Document the current constructor better to mention that to connect to a ZK 
> ensemble you can pass a comma-delimited list of ZK servers like 
> zk1:2181,zk2:2181,zk3:2181
> - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5232) SolrCloud should distribute updates via streaming rather than buffering.

2014-03-20 Thread Jessica Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941957#comment-13941957
 ] 

Jessica Cheng commented on SOLR-5232:
-

Just curious--has anyone gotten a chance to run with both before and after this 
change to see if the throughput is improved?

> SolrCloud should distribute updates via streaming rather than buffering.
> 
>
> Key: SOLR-5232
> URL: https://issues.apache.org/jira/browse/SOLR-5232
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Critical
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5232.patch, SOLR-5232.patch, SOLR-5232.patch, 
> SOLR-5232.patch, SOLR-5232.patch, SOLR-5232.patch
>
>
> The current approach was never the best for SolrCloud - it was designed for a 
> pre SolrCloud Solr - it also uses too many connections and threads - nailing 
> that down is likely wasted effort when we should really move away from 
> explicitly buffering docs and sending small batches per thread as we have 
> been doing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-20 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5852:
---

Attachment: SOLR-5852-SH.patch

New patch against trunk.  Previous patch was against trunk too, but a couple of 
hours after I went to bed, a conflicting patch was committed.

This does make a change to CloudSolrServerTest bits that just got added, but 
only to eliminate warnings.  It does not change the function.


> Add CloudSolrServer helper method to connect to a ZK ensemble
> -
>
> Key: SOLR-5852
> URL: https://issues.apache.org/jira/browse/SOLR-5852
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
> Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, 
> SOLR-5852_FK.patch, SOLR-5852_FK.patch
>
>
> We should have a CloudSolrServer constructor which takes a list of ZK servers 
> to connect to.
> Something Like 
> {noformat}
> public CloudSolrServer(String... zkHost);
> {noformat}
> - Document the current constructor better to mention that to connect to a ZK 
> ensemble you can pass a comma-delimited list of ZK servers like 
> zk1:2181,zk2:2181,zk3:2181
> - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-20 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941912#comment-13941912
 ] 

Erick Erickson commented on SOLR-5852:
--

Yeah, I saw that. Was it against trunk or 4x? That might account for the 
difference.

BTW, I thought I'd mention that sleep is a good thing :).

> Add CloudSolrServer helper method to connect to a ZK ensemble
> -
>
> Key: SOLR-5852
> URL: https://issues.apache.org/jira/browse/SOLR-5852
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
> Attachments: SOLR-5852-SH.patch, SOLR-5852.patch, SOLR-5852_FK.patch, 
> SOLR-5852_FK.patch
>
>
> We should have a CloudSolrServer constructor which takes a list of ZK servers 
> to connect to.
> Something Like 
> {noformat}
> public CloudSolrServer(String... zkHost);
> {noformat}
> - Document the current constructor better to mention that to connect to a ZK 
> ensemble you can pass a comma-delimited list of ZK servers like 
> zk1:2181,zk2:2181,zk3:2181
> - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-03-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941914#comment-13941914
 ] 

ASF subversion and git services commented on SOLR-5865:
---

Commit 1579682 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1579682 ]

SOLR-5865: Un@Ignore test again.

> Provide a MiniSolrCloudCluster to enable easier testing
> ---
>
> Key: SOLR-5865
> URL: https://issues.apache.org/jira/browse/SOLR-5865
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.7, 5.0
>Reporter: Gregory Chanan
>Assignee: Mark Miller
> Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch
>
>
> Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
> which has a couple of issues around support for downstream projects:
> - It's difficult to test SolrCloud support in a downstream project that may 
> have its own test framework.  For example, some projects have support for 
> different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
> against each of the different backends.  This is difficult to do cleanly, 
> because the Solr tests require derivation from LuceneTestCase, while the 
> other don't
> - The LuceneTestCase class hierarchy is really designed for internal solr 
> tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
> downstream project probably doesn't care about that).  It's also quite 
> complicated and dense, much more so than a downstream project would want.
> Given these reasons, it would be nice to provide a simple 
> "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or 
> HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-03-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941910#comment-13941910
 ] 

ASF subversion and git services commented on SOLR-5865:
---

Commit 1579679 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1579679 ]

SOLR-5865: Un@Ignore test again.

> Provide a MiniSolrCloudCluster to enable easier testing
> ---
>
> Key: SOLR-5865
> URL: https://issues.apache.org/jira/browse/SOLR-5865
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.7, 5.0
>Reporter: Gregory Chanan
>Assignee: Mark Miller
> Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch
>
>
> Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
> which has a couple of issues around support for downstream projects:
> - It's difficult to test SolrCloud support in a downstream project that may 
> have its own test framework.  For example, some projects have support for 
> different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
> against each of the different backends.  This is difficult to do cleanly, 
> because the Solr tests require derivation from LuceneTestCase, while the 
> other don't
> - The LuceneTestCase class hierarchy is really designed for internal solr 
> tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
> downstream project probably doesn't care about that).  It's also quite 
> complicated and dense, much more so than a downstream project would want.
> Given these reasons, it would be nice to provide a simple 
> "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or 
> HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5890) Delete silently fails if not sent to shard where document was added

2014-03-20 Thread Xavier Riley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941897#comment-13941897
 ] 

Xavier Riley commented on SOLR-5890:


Yes router is set to "implicit"

> Delete silently fails if not sent to shard where document was added
> ---
>
> Key: SOLR-5890
> URL: https://issues.apache.org/jira/browse/SOLR-5890
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.7
> Environment: Debian 7.4.
>Reporter: Peter Inglesby
> Fix For: 4.8, 5.0, 4.7.1
>
>
> We have SolrCloud set up with two shards, each with a leader and a replica.  
> We use haproxy to distribute requests between the four nodes.
> Regardless of which node we send an add request to, following a commit, the 
> newly-added document is returned in a search, as expected.
> However, we can only delete a document if the delete request is sent to a 
> node in the shard where the document was added.  If we send the delete 
> request to a node in the other shard (and then send a commit) the document is 
> not deleted.  Such a delete request will get a 200 response, with the 
> following body:
>   {'responseHeader'=>{'status'=>0,'QTime'=>7}}
> Apart from the the very low QTime, this is indistinguishable from a 
> successful delete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5860) Logging around core wait for state during startup / recovery is confusing

2014-03-20 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941893#comment-13941893
 ] 

Shalin Shekhar Mangar commented on SOLR-5860:
-

Yes, the patch looks good to me. I'll commit after running all tests.

> Logging around core wait for state during startup / recovery is confusing
> -
>
> Key: SOLR-5860
> URL: https://issues.apache.org/jira/browse/SOLR-5860
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Timothy Potter
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Attachments: SOLR-5860.patch
>
>
> I'm seeing some log messages like this:
> I was asked to wait on state recovering for HOST:8984_solr but I still do not 
> see the requested state. I see state: recovering live:true
> This is very confusing because from the log, it seems like it's waiting to 
> see the state it's in ... After digging through the code, it appears that it 
> is really waiting for a leader to become active so that it has a leader to 
> recover from.
> I'd like to improve the logging around this critical wait loop to give better 
> context to what is happening. 
> Also, I would like to change the following so that we force state updates 
> every 15 seconds for the entire wait period.
> -  if (retry == 15 || retry == 60) {
> +  if (retry % 15 == 0) {
> As-is, it's waiting 120 seconds but only forcing the state to update twice, 
> once after 15 seconds and again after 60 … might be good to force updates for 
> the full wait period.
> Lastly, I think it would be good to use the leaderConflictResolveWait setting 
> (from ZkController) here as well since 120 may not be enough for a leader to 
> become active in a busy cluster, esp. after the node the Overseer is running 
> on. Maybe leaderConflictResolveWait + 5 seconds?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5749) Implement an Overseer status API

2014-03-20 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5749:


Attachment: SOLR-5749.patch

This patch adds tracking 10 most recent failures (with entire request/response) 
for each Collection API action. I think this along with the requeststatus API 
added in SOLR-5477 removes the need to expose entire logs.

This can be committed now. In order to write/read stats from ZK, we need to be 
able to serialize Timer and related classes. I shall do that via a different 
issue.

> Implement an Overseer status API
> 
>
> Key: SOLR-5749
> URL: https://issues.apache.org/jira/browse/SOLR-5749
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 5.0
>
> Attachments: SOLR-5749.patch, SOLR-5749.patch, SOLR-5749.patch, 
> SOLR-5749.patch, SOLR-5749.patch
>
>
> Right now there is little to no information exposed about the overseer from 
> SolrCloud.
> I propose that we have an API for overseer status which can return:
> # Past N commands executed (grouped by command type)
> # Status (queue-size, current overseer leader node)
> # Overseer log



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5860) Logging around core wait for state during startup / recovery is confusing

2014-03-20 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941884#comment-13941884
 ] 

Mark Miller commented on SOLR-5860:
---

I've touched on this area working on SOLR-5884 as well - this is more 
thoughtful stuff though, so would be great to get it in before I commit 
SOLR-5884.

> Logging around core wait for state during startup / recovery is confusing
> -
>
> Key: SOLR-5860
> URL: https://issues.apache.org/jira/browse/SOLR-5860
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Timothy Potter
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Attachments: SOLR-5860.patch
>
>
> I'm seeing some log messages like this:
> I was asked to wait on state recovering for HOST:8984_solr but I still do not 
> see the requested state. I see state: recovering live:true
> This is very confusing because from the log, it seems like it's waiting to 
> see the state it's in ... After digging through the code, it appears that it 
> is really waiting for a leader to become active so that it has a leader to 
> recover from.
> I'd like to improve the logging around this critical wait loop to give better 
> context to what is happening. 
> Also, I would like to change the following so that we force state updates 
> every 15 seconds for the entire wait period.
> -  if (retry == 15 || retry == 60) {
> +  if (retry % 15 == 0) {
> As-is, it's waiting 120 seconds but only forcing the state to update twice, 
> once after 15 seconds and again after 60 … might be good to force updates for 
> the full wait period.
> Lastly, I think it would be good to use the leaderConflictResolveWait setting 
> (from ZkController) here as well since 120 may not be enough for a leader to 
> become active in a busy cluster, esp. after the node the Overseer is running 
> on. Maybe leaderConflictResolveWait + 5 seconds?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-03-20 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941876#comment-13941876
 ] 

Mark Miller commented on SOLR-5865:
---

bq.  I think the main issue is the zkHost sys pro

Hmm - that is not it either - the MiniSolrCloudCluster will clear those on 
shutdown as well. I'm still seeing some leakage somehow though.

bq. Can you just use SystemPropertiesRestoreRule?

Let me give a try.

> Provide a MiniSolrCloudCluster to enable easier testing
> ---
>
> Key: SOLR-5865
> URL: https://issues.apache.org/jira/browse/SOLR-5865
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.7, 5.0
>Reporter: Gregory Chanan
>Assignee: Mark Miller
> Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch
>
>
> Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
> which has a couple of issues around support for downstream projects:
> - It's difficult to test SolrCloud support in a downstream project that may 
> have its own test framework.  For example, some projects have support for 
> different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
> against each of the different backends.  This is difficult to do cleanly, 
> because the Solr tests require derivation from LuceneTestCase, while the 
> other don't
> - The LuceneTestCase class hierarchy is really designed for internal solr 
> tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
> downstream project probably doesn't care about that).  It's also quite 
> complicated and dense, much more so than a downstream project would want.
> Given these reasons, it would be nice to provide a simple 
> "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or 
> HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5541) FileExistsCachingDirectory, to work around unreliable File.exists

2014-03-20 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5541:
--

 Summary: FileExistsCachingDirectory, to work around unreliable 
File.exists
 Key: LUCENE-5541
 URL: https://issues.apache.org/jira/browse/LUCENE-5541
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless


File.exists is a dangerous method in Java, because if there is a
low-level IOException (permission denied, out of file handles, etc.)
the method can return false when it should return true.

Fortunately, as of Lucene 4.x, we rely much less on File.exists,
because we track which files the codec components created, and we know
those files then exist.

But, unfortunately, going from 3.0.x to 3.6.x, we increased our
reliance on File.exists, e.g. when creating CFS we check File.exists
on each sub-file before trying to add it, and I have a customer
corruption case where apparently a transient low level IOE caused
File.exists to incorrectly return false for one of the sub-files.  It
results in corruption like this:

{noformat}
  java.io.FileNotFoundException: No sub-file with id .fnm found 
(fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx])
  
org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157)
  
org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146)
  org.apache.lucene.index.FieldInfos.(FieldInfos.java:71)
  org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212)
  
org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228)
  org.apache.lucene.index.IndexWriter.(IndexWriter.java:1161)
{noformat}

I think typically local file systems don't often hit such low level
errors, but if you have an index on a remote filesystem, where network
hiccups can cause problems, it's more likely.

As a simple workaround, I created a basic Directory delegator that
holds a Set of all created but not deleted files, and short-circuits
fileExists to return true if the file is in that set.

I don't plan to commit this: we aren't doing bug-fix releases on
3.6.x anymore (it's very old by now), and this problem is already
"fixed" in 4.x (by reducing our reliance on File.exists), but I wanted
to post the code here in case others hit it.  It looks like it was hit
e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and
https://issues.jboss.org/browse/ISPN-2981 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-03-20 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941873#comment-13941873
 ] 

Alan Woodward commented on SOLR-5865:
-

Can you just use SystemPropertiesRestoreRule?

> Provide a MiniSolrCloudCluster to enable easier testing
> ---
>
> Key: SOLR-5865
> URL: https://issues.apache.org/jira/browse/SOLR-5865
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.7, 5.0
>Reporter: Gregory Chanan
>Assignee: Mark Miller
> Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch
>
>
> Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
> which has a couple of issues around support for downstream projects:
> - It's difficult to test SolrCloud support in a downstream project that may 
> have its own test framework.  For example, some projects have support for 
> different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
> against each of the different backends.  This is difficult to do cleanly, 
> because the Solr tests require derivation from LuceneTestCase, while the 
> other don't
> - The LuceneTestCase class hierarchy is really designed for internal solr 
> tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
> downstream project probably doesn't care about that).  It's also quite 
> complicated and dense, much more so than a downstream project would want.
> Given these reasons, it would be nice to provide a simple 
> "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or 
> HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5890) Delete silently fails if not sent to shard where document was added

2014-03-20 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941865#comment-13941865
 ] 

Mark Miller commented on SOLR-5890:
---

If you look at the SolrAdmin cloud section, under the zk tree view, what router 
impl do you see in clusterstate.json? Is it implicit by any chance?

> Delete silently fails if not sent to shard where document was added
> ---
>
> Key: SOLR-5890
> URL: https://issues.apache.org/jira/browse/SOLR-5890
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.7
> Environment: Debian 7.4.
>Reporter: Peter Inglesby
> Fix For: 4.8, 5.0, 4.7.1
>
>
> We have SolrCloud set up with two shards, each with a leader and a replica.  
> We use haproxy to distribute requests between the four nodes.
> Regardless of which node we send an add request to, following a commit, the 
> newly-added document is returned in a search, as expected.
> However, we can only delete a document if the delete request is sent to a 
> node in the shard where the document was added.  If we send the delete 
> request to a node in the other shard (and then send a commit) the document is 
> not deleted.  Such a delete request will get a 200 response, with the 
> following body:
>   {'responseHeader'=>{'status'=>0,'QTime'=>7}}
> Apart from the the very low QTime, this is indistinguishable from a 
> successful delete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5890) Delete silently fails if not sent to shard where document was added

2014-03-20 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5890:
--

Fix Version/s: 4.7.1
   5.0
   4.8

> Delete silently fails if not sent to shard where document was added
> ---
>
> Key: SOLR-5890
> URL: https://issues.apache.org/jira/browse/SOLR-5890
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.7
> Environment: Debian 7.4.
>Reporter: Peter Inglesby
> Fix For: 4.8, 5.0, 4.7.1
>
>
> We have SolrCloud set up with two shards, each with a leader and a replica.  
> We use haproxy to distribute requests between the four nodes.
> Regardless of which node we send an add request to, following a commit, the 
> newly-added document is returned in a search, as expected.
> However, we can only delete a document if the delete request is sent to a 
> node in the shard where the document was added.  If we send the delete 
> request to a node in the other shard (and then send a commit) the document is 
> not deleted.  Such a delete request will get a 200 response, with the 
> following body:
>   {'responseHeader'=>{'status'=>0,'QTime'=>7}}
> Apart from the the very low QTime, this is indistinguishable from a 
> successful delete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5890) Delete silently fails if not sent to shard where document was added

2014-03-20 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941862#comment-13941862
 ] 

Mark Miller commented on SOLR-5890:
---

Very strange - the request should be forwarded. This will be interesting.

> Delete silently fails if not sent to shard where document was added
> ---
>
> Key: SOLR-5890
> URL: https://issues.apache.org/jira/browse/SOLR-5890
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.7
> Environment: Debian 7.4.
>Reporter: Peter Inglesby
> Fix For: 4.8, 5.0, 4.7.1
>
>
> We have SolrCloud set up with two shards, each with a leader and a replica.  
> We use haproxy to distribute requests between the four nodes.
> Regardless of which node we send an add request to, following a commit, the 
> newly-added document is returned in a search, as expected.
> However, we can only delete a document if the delete request is sent to a 
> node in the shard where the document was added.  If we send the delete 
> request to a node in the other shard (and then send a commit) the document is 
> not deleted.  Such a delete request will get a 200 response, with the 
> following body:
>   {'responseHeader'=>{'status'=>0,'QTime'=>7}}
> Apart from the the very low QTime, this is indistinguishable from a 
> successful delete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-20 Thread Gopal Patwa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941860#comment-13941860
 ] 

Gopal Patwa commented on SOLR-4787:
---

Thanks Kranti, here is my usecase 

Event Collection:
eventId=1
title=Lady Gaga
date=06/03/2014

EventTicketStats Collection
eventId=1
minPrice=200
minQuantity=5

When user search for "lady gaga" on event document using hjoin with 
EventTicketStats then result should include min price and qty data from join 
core.

Final Result for Event Collection:
eventId=1
title=Lady Gaga
date=06/03/2014
minPrice=200
minQuantity=5

And user has option to filter result for price and qty like show events for 
minPrice < 100
The reason we have EventStats in separate document that our ticket data changes 
every 5 seconds but Event data changes are like twice a day

I thought using Updatable Numeric DocValue after denormalizing Event document 
with min price and qty fields But Solr does not have support for that feature 
yet. So I need to rely on using join


> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.8
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query a

[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-03-20 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941857#comment-13941857
 ] 

Mark Miller commented on SOLR-5865:
---

Thanks Greg - I think the main issue is the zkHost sys prop - I've added the 
following as well:

System.clearProperty("solr.solrxml.location");
System.clearProperty("zkHost");

That's one complication of avoiding the test framework - normally there are 
checks applied for this type of thing and the test will fail if you violate it 
and tell which sys props were not reset or which threads were not stopped, etc.

> Provide a MiniSolrCloudCluster to enable easier testing
> ---
>
> Key: SOLR-5865
> URL: https://issues.apache.org/jira/browse/SOLR-5865
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.7, 5.0
>Reporter: Gregory Chanan
>Assignee: Mark Miller
> Attachments: SOLR-5865.patch, SOLR-5865.patch, SOLR-5865addendum.patch
>
>
> Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
> which has a couple of issues around support for downstream projects:
> - It's difficult to test SolrCloud support in a downstream project that may 
> have its own test framework.  For example, some projects have support for 
> different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
> against each of the different backends.  This is difficult to do cleanly, 
> because the Solr tests require derivation from LuceneTestCase, while the 
> other don't
> - The LuceneTestCase class hierarchy is really designed for internal solr 
> tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
> downstream project probably doesn't care about that).  It's also quite 
> complicated and dense, much more so than a downstream project would want.
> Given these reasons, it would be nice to provide a simple 
> "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or 
> HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5891) Problems installing Apache Solr with Apache Tomcat

2014-03-20 Thread Dean Zambrano (JIRA)
Dean Zambrano created SOLR-5891:
---

 Summary: Problems installing Apache Solr with Apache Tomcat
 Key: SOLR-5891
 URL: https://issues.apache.org/jira/browse/SOLR-5891
 Project: Solr
  Issue Type: Bug
  Components: clients - java, SolrCloud
Affects Versions: 4.7
 Environment: Centos 6.5 Installing Apache Tomcat 7.0.52 with Apache 
Solr 4.7.0
Reporter: Dean Zambrano
 Fix For: 4.7


I installed Apache Solr, version 4.7.0. As part of the install, I performed the 
following: 
(Based on these instructions: 
http://sachkadam.wordpress.com/2013/04/29/solr-installation-on-centos-6/) -> 
The summary is as follows:

-  I moved the "/example/solr" directory to /opt/solr. 
-  I created a "solr.xml" file with contains the following xml code:
# more solr.xml






The "solr.xml" file is located in: 
/usr/share/apache-tomcat-7.0.52/conf/Catalina/localhost

**When I try to access solr through the following URL: 
http://107.170.94.202:8983, I receive the following error: 

{msg=SolrCore 'collection1' is not available due to init failure: Could not 
load config file 
/root/solr-4.7.0/example/solr/collection1/solrconfig.xml,trace=org.apache.solr.common.SolrException:
 SolrCore 'collection1' is not available due to init failure: Could not load 
config file /root/solr-4.7.0/example/solr/collection1/solrconfig.xml
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:317)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.solr.common.SolrException: Could not load config file 
/root/solr-4.7.0/example/solr/collection1/solrconfig.xml
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:530)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in 
classp

Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 541 - Still Failing

2014-03-20 Thread Michael McCandless
I committed a fix.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Mar 19, 2014 at 6:33 PM, Apache Jenkins Server
 wrote:
> Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/541/
>
> 1 tests failed.
> REGRESSION:  
> org.apache.lucene.replicator.LocalReplicatorTest.testObtainMissingFile
>
> Error Message:
> /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-NightlyTests-4.x/lucene/build/replicator/test/J1/index3004630048tmp/madeUpFile
>
> Stack Trace:
> java.nio.file.NoSuchFileException: 
> /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-NightlyTests-4.x/lucene/build/replicator/test/J1/index3004630048tmp/madeUpFile
> at 
> __randomizedtesting.SeedInfo.seed([441C78757690D0BA:621FD230CEC36F9D]:0)
> at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:176)
> at java.nio.channels.FileChannel.open(FileChannel.java:287)
> at java.nio.channels.FileChannel.open(FileChannel.java:334)
> at 
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:82)
> at 
> org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
> at 
> org.apache.lucene.replicator.IndexRevision.open(IndexRevision.java:136)
> at 
> org.apache.lucene.replicator.LocalReplicator.obtainFile(LocalReplicator.java:198)
> at 
> org.apache.lucene.replicator.LocalReplicatorTest.testObtainMissingFile(LocalReplicatorTest.java:155)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAsse

[jira] [Commented] (SOLR-5880) org.apache.solr.client.solrj.impl.CloudSolrServerTest is failing pretty much every time for a long time with an exception about not being able to connect to ZooKeeper wi

2014-03-20 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941854#comment-13941854
 ] 

Mark Miller commented on SOLR-5880:
---

They were related to SOLR-5865.

> org.apache.solr.client.solrj.impl.CloudSolrServerTest is failing pretty much 
> every time for a long time with an exception about not being able to connect 
> to ZooKeeper within the timeout.
> --
>
> Key: SOLR-5880
> URL: https://issues.apache.org/jira/browse/SOLR-5880
> Project: Solr
>  Issue Type: Test
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.8, 5.0
>
>
> This test is failing consistently, though currently only on Policeman Jenkins 
> servers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Google Summer of Code

2014-03-20 Thread Furkan KAMACI
Hi Michael;

Thanks for the explanation.

Furkan KAMACI


2014-03-20 17:12 GMT+02:00 Michael McCandless :

> Unfortunately, the only two GSoC mentors we seem to have this year is
> David Smiley and myself, and we each are already signed up to mentor
> one student, and there's at least two other students expressing
> interest in different issues.
>
> So it looks like we have too many students and too few mentors.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Mar 20, 2014 at 9:50 AM, Furkan KAMACI 
> wrote:
> > Hi;
> >
> > I want to apply for Google Summer of Code if I can catch up the deadline.
> > I've checked the issues. I want to ask that is there any issue which is
> > labeled for GSoC and has a volunteer mentor but nobody is applied?
> Because I
> > see that there are comments at some issues which asks about volunteer
> > mentors. If there is any issue I will be appreciated to work on it.
> >
> > Thanks;
> > Furkan KAMACI
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-20 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941852#comment-13941852
 ] 

Shawn Heisey commented on SOLR-5852:


I just made that patch a few hours ago!

New patch coming up as soon as I work my way through the conflicts.


> Add CloudSolrServer helper method to connect to a ZK ensemble
> -
>
> Key: SOLR-5852
> URL: https://issues.apache.org/jira/browse/SOLR-5852
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
> Attachments: SOLR-5852-SH.patch, SOLR-5852.patch, SOLR-5852_FK.patch, 
> SOLR-5852_FK.patch
>
>
> We should have a CloudSolrServer constructor which takes a list of ZK servers 
> to connect to.
> Something Like 
> {noformat}
> public CloudSolrServer(String... zkHost);
> {noformat}
> - Document the current constructor better to mention that to connect to a ZK 
> ensemble you can pass a comma-delimited list of ZK servers like 
> zk1:2181,zk2:2181,zk3:2181
> - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5890) Delete silently fails if not sent to shard where document was added

2014-03-20 Thread Peter Inglesby (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Inglesby updated SOLR-5890:
-

Summary: Delete silently fails if not sent to shard where document was 
added  (was: Delete silently fails if not sent to node where document was added)

> Delete silently fails if not sent to shard where document was added
> ---
>
> Key: SOLR-5890
> URL: https://issues.apache.org/jira/browse/SOLR-5890
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.7
> Environment: Debian 7.4.
>Reporter: Peter Inglesby
>
> We have SolrCloud set up with two shards, each with a leader and a replica.  
> We use haproxy to distribute requests between the four nodes.
> Regardless of which node we send an add request to, following a commit, the 
> newly-added document is returned in a search, as expected.
> However, we can only delete a document if the delete request is sent to a 
> node in the shard where the document was added.  If we send the delete 
> request to a node in the other shard (and then send a commit) the document is 
> not deleted.  Such a delete request will get a 200 response, with the 
> following body:
>   {'responseHeader'=>{'status'=>0,'QTime'=>7}}
> Apart from the the very low QTime, this is indistinguishable from a 
> successful delete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Deleted] (SOLR-5889) aaaa

2014-03-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler deleted SOLR-5889:



> 
> 
>
> Key: SOLR-5889
> URL: https://issues.apache.org/jira/browse/SOLR-5889
> Project: Solr
>  Issue Type: Bug
>Reporter: linxiaohu
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5890) Delete silently fails if not sent to node where document was added

2014-03-20 Thread Peter Inglesby (JIRA)
Peter Inglesby created SOLR-5890:


 Summary: Delete silently fails if not sent to node where document 
was added
 Key: SOLR-5890
 URL: https://issues.apache.org/jira/browse/SOLR-5890
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7
 Environment: Debian 7.4.
Reporter: Peter Inglesby


We have SolrCloud set up with two shards, each with a leader and a replica.  We 
use haproxy to distribute requests between the four nodes.

Regardless of which node we send an add request to, following a commit, the 
newly-added document is returned in a search, as expected.

However, we can only delete a document if the delete request is sent to a node 
in the shard where the document was added.  If we send the delete request to a 
node in the other shard (and then send a commit) the document is not deleted.  
Such a delete request will get a 200 response, with the following body:

  {'responseHeader'=>{'status'=>0,'QTime'=>7}}

Apart from the the very low QTime, this is indistinguishable from a successful 
delete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5394) facet.method=fcs seems to be using threads when it shouldn't

2014-03-20 Thread Dmitry Kan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-5394:
-

Attachment: SOLR-5394.patch

This patch sets the default threads to 1 (single thread execution) as per 
Vitaly's suggestion. Fixed the test case with unspecified threads parameter: 
the number of threads is expected to be the default (=1). The tests in 
TestSimpleFacet pass.

> facet.method=fcs seems to be using threads when it shouldn't
> 
>
> Key: SOLR-5394
> URL: https://issues.apache.org/jira/browse/SOLR-5394
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Michael McCandless
> Attachments: SOLR-5394.patch, SOLR-5394.patch, 
> SOLR-5394_keep_threads_original_value.patch
>
>
> I built a wikipedia index, with multiple fields for faceting.
> When I do facet.method=fcs with facet.field=dateFacet and 
> facet.field=userNameFacet, and then kill -QUIT the java process, I see a 
> bunch (46, I think) of facetExecutor-7-thread-N threads had spun up.
> But I thought threads for each field is turned off by default?
> Even if I add facet.threads=0, it still spins up all the threads.
> I think something is wrong in SimpleFacets.parseParams; somehow, that method 
> returns early (because localParams) is null, leaving threads=-1, and then the 
> later code that would have set threads to 0 never runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Google Summer of Code

2014-03-20 Thread Michael McCandless
Unfortunately, the only two GSoC mentors we seem to have this year is
David Smiley and myself, and we each are already signed up to mentor
one student, and there's at least two other students expressing
interest in different issues.

So it looks like we have too many students and too few mentors.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Mar 20, 2014 at 9:50 AM, Furkan KAMACI  wrote:
> Hi;
>
> I want to apply for Google Summer of Code if I can catch up the deadline.
> I've checked the issues. I want to ask that is there any issue which is
> labeled for GSoC and has a volunteer mentor but nobody is applied? Because I
> see that there are comments at some issues which asks about volunteer
> mentors. If there is any issue I will be appreciated to work on it.
>
> Thanks;
> Furkan KAMACI

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5824) Merge up Solr MapReduce contrib code to latest external changes.

2014-03-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941793#comment-13941793
 ] 

ASF subversion and git services commented on SOLR-5824:
---

Commit 1579648 from [~markrmil...@gmail.com] in branch 
'dev/branches/lucene_solr_4_7'
[ https://svn.apache.org/r1579648 ]

SOLR-5824: Merge up Solr MapReduce contrib code to latest external changes. 
Includes a few minor bug fixes.

> Merge up Solr MapReduce contrib code to latest external changes.
> 
>
> Key: SOLR-5824
> URL: https://issues.apache.org/jira/browse/SOLR-5824
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.8, 5.0, 4.7.1
>
> Attachments: SOLR-5824.patch
>
>
> There are a variety of changes in the mapreduce contrib code that have 
> occurred while getting the initial stuff committed - they need to be merged 
> in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter

2014-03-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941779#comment-13941779
 ] 

Robert Muir commented on LUCENE-4984:
-

Its even simpler than that. But i wanted to do that in a followup issue. 4.8 is 
a good time to fix it, as its easy with this tokenizer!

> Fix ThaiWordFilter
> --
>
> Key: LUCENE-4984
> URL: https://issues.apache.org/jira/browse/LUCENE-4984
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch
>
>
> ThaiWordFilter is an offender in TestRandomChains because it creates 
> positions and updates offsets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-03-20 Thread Da Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941775#comment-13941775
 ] 

Da Huang commented on LUCENE-4396:
--

Sorry for my late reply. I have been thinking about the new code/design on the 
trunk these days. 

The new code breaks out BulkScorer from Scorer, and it is necessary to create a 
new BooleanScorer (a Scorer), just as you said. I'm afraid that we do have to 
take Scorer instead as subScorer in the new BooleanScorer. And yes: 
BooleanBulkScorer should not be embeded as its docIDs are out of order. My idea 
is to keep BooleanBulkScorer just supporting no-MUST-clause case, and let the 
new BooleanScorer to deal with the case where there is at least one MUST 
clause. I think this is one of the best ways to be compatible with the current 
design.

Besides, I'm afraid that the name of BulkScorer may be confusing. The new 
BooleanScorer is also implemented by scoring a range of documents at once, but 
it actually can act as Sub-Scorer.

> BooleanScorer should sometimes be used for MUST clauses
> ---
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 100 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-20 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941765#comment-13941765
 ] 

Kranti Parisa commented on SOLR-4787:
-

Gopal, you can't get the values using joins. you will need to make a second 
call with the result (potentially sorted and paginated on firstCore). Using FQs 
in the first join call, you can hit the caches in the second call. if you need 
more details, describe your use case

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.8
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> hjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And th

[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-20 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941760#comment-13941760
 ] 

Erick Erickson commented on SOLR-5852:
--

Hey Shawn:

I tried to apply your patch to a new checkout for trunk and had merge 
conflicts. It looks like the test code changed. Could you regenerate the patch?

Thanks!

> Add CloudSolrServer helper method to connect to a ZK ensemble
> -
>
> Key: SOLR-5852
> URL: https://issues.apache.org/jira/browse/SOLR-5852
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
> Attachments: SOLR-5852-SH.patch, SOLR-5852.patch, SOLR-5852_FK.patch, 
> SOLR-5852_FK.patch
>
>
> We should have a CloudSolrServer constructor which takes a list of ZK servers 
> to connect to.
> Something Like 
> {noformat}
> public CloudSolrServer(String... zkHost);
> {noformat}
> - Document the current constructor better to mention that to connect to a ZK 
> ensemble you can pass a comma-delimited list of ZK servers like 
> zk1:2181,zk2:2181,zk3:2181
> - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4984) Fix ThaiWordFilter

2014-03-20 Thread Ryan Ernst (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941752#comment-13941752
 ] 

Ryan Ernst commented on LUCENE-4984:


+1, patch lgtm

Is fixing Smart Chinese to not emit punctuation as simple as hardcoding the 
list of punctuation characters and skipping them in something like 
incrementWord()?

> Fix ThaiWordFilter
> --
>
> Key: LUCENE-4984
> URL: https://issues.apache.org/jira/browse/LUCENE-4984
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Attachments: LUCENE-4984.patch, LUCENE-4984.patch, LUCENE-4984.patch
>
>
> ThaiWordFilter is an offender in TestRandomChains because it creates 
> positions and updates offsets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Google Summer of Code

2014-03-20 Thread Furkan KAMACI
Hi;

I want to apply for Google Summer of Code if I can catch up the deadline.
I've checked the issues. I want to ask that is there any issue which is
labeled for GSoC and has a volunteer mentor but nobody is applied? Because
I see that there are comments at some issues which asks about volunteer
mentors. If there is any issue I will be appreciated to work on it.

Thanks;
Furkan KAMACI


[jira] [Commented] (LUCENE-3122) Cascaded grouping

2014-03-20 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941727#comment-13941727
 ] 

Furkan KAMACI commented on LUCENE-3122:
---

[~mikemccand] Could you explain this issue a bit more?

> Cascaded grouping
> -
>
> Key: LUCENE-3122
> URL: https://issues.apache.org/jira/browse/LUCENE-3122
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Michael McCandless
>  Labels: gsoc2014
> Fix For: 4.8
>
>
> Similar to SOLR-2526, in that you are grouping on 2 separate fields, but 
> instead of treating those fields as a single grouping by a compound key, this 
> change would let you first group on key1 for the primary groups and then 
> secondarily on key2 within the primary groups.
> Ie, the result you get back would have groups A, B, C (grouped by key1) but 
> then the documents within group A would be grouped by key 2.
> I think this will be important for apps whose documents are the product of 
> denormalizing, ie where the Lucene document is really a sub-document of a 
> different identifier field.  Borrowing an example from LUCENE-3097, you have 
> doctors but each doctor may have multiple offices (addresses) where they 
> practice and so you index doctor X address as your lucene documents.  In this 
> case, your "identifier" field (that which "counts" for facets, and should be 
> "grouped" for presentation) is doctorid.  When you offer users search over 
> this index, you'd likely want to 1) group by distance (ie, < 0.1 miles, < 0.2 
> miles, etc., as a function query), but 2) also group by doctorid, ie cascaded 
> grouping.
> I suspect this would be easier to implement than it sounds: the per-group 
> collector used by the 2nd pass grouping collector for key1's grouping just 
> needs to be another grouping collector.  Spookily, though, that collection 
> would also have to be 2-pass, so it could get tricky since grouping is sort 
> of recursing on itself once we have LUCENE-3112, though, that should 
> enable efficient single pass grouping by the identifier (doctorid).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-5889) aaaa

2014-03-20 Thread linxiaohu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

linxiaohu closed SOLR-5889.
---

Resolution: Duplicate

> 
> 
>
> Key: SOLR-5889
> URL: https://issues.apache.org/jira/browse/SOLR-5889
> Project: Solr
>  Issue Type: Bug
>Reporter: linxiaohu
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5889) aaaa

2014-03-20 Thread linxiaohu (JIRA)
linxiaohu created SOLR-5889:
---

 Summary: 
 Key: SOLR-5889
 URL: https://issues.apache.org/jira/browse/SOLR-5889
 Project: Solr
  Issue Type: Bug
Reporter: linxiaohu






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: Removal of Scorer.weight

2014-03-20 Thread shebiki
Github user shebiki commented on the pull request:

https://github.com/apache/lucene-solr/pull/40#issuecomment-38165595
  
Mikhail,

I have a similar use case and opted for creating the `BooleanScorer2` 
directly instead of trying to associate each child `Scorer` with the drilldown 
id. I chose not to use the 
[QueryWrapper](https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/facet/src/java/org/apache/lucene/facet/search/DrillSideways.java#L352)
 pattern from `DrillSideways` in 4.6.0 because I felt it would prevent future 
optimizations and it was no longer in use in 4.7. I didn't consider the idea of 
just comparing `scorer.getWeight().getQuery()` but it's essentially the same 
work flow.

The reason that I felt that prevented further optimization is that it 
prevents a `Weight` instance from returning an already created child `Scorer`. 
For example:

* A `BooleanQuery` consisting of just `SHOULD` clauses with `disableCoord` 
set to `true`. If a segment only has one non-null scorer then 
`BooleanWeight.scorer()` should be able to return just that child scorer 
instead of having to wrap it with another.
* Introduction of a extra scoring metadata (imagine decorating each score 
with an additional `boolean`). In this case a composing query (variant of 
`BooleanQuery`, ``DisjunctionMaxQuery`, etc) would want to aggregated this 
extra metadata at scoring time. If the metadata has a decent default value then 
only some of the child `Scorer`s will be able to provide it. If non of the 
child `Scorer`s provide this metadata then it's calculation can probably be 
short circuited and it can just return a `BooleanScorer`, `ConjunctionScorer`, 
or `DisjunctionScorer` as needed. This would be more efficient than making it 
wrap unconditionally.

Quick question about your particular drillsideways query. Do you call 
`score()`, `freq()`, or something else to ensure the `SHOULD` `Scorer`s are 
correctly positioned? Do you optimize for when `BooleanQuery` returns a 
`DisjunctionScorer` and the child `Scorer`s are already positioned?

--Terry



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >