date:20140306


 [ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Audenaerde updated LUCENE-5476:
---

Attachment: LUCENE-5476.patch

I have a patch ready that implements the random sampling usign override on 
{{.getMatchingDocs()}}. It passes the test, so it should be ok :). 

It is slower however (only 3x speedup),  but maybe there is room for 
optimization?

Exact  :168 ms
Sampled:55 ms


 Facet sampling
 --

 Key: LUCENE-5476
 URL: https://issues.apache.org/jira/browse/LUCENE-5476
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Rob Audenaerde
 Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java


 With LUCENE-5339 facet sampling disappeared. 
 When trying to display facet counts on large datasets (10M documents) 
 counting facets is rather expensive, as all the hits are collected and 
 processed. 
 Sampling greatly reduced this and thus provided a nice speedup. Could it be 
 brought back?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5476) Facet sampling

2014-03-06 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922234#comment-13922234
]

Shai Erera commented on LUCENE-5476:

Thanks Rob. Few comments:

* I don't think that we need both totalHits and segmentHits. I know it may look
not so expensive, but if you think about these ++ for millions of hits, they
add up. Instead, I think we should stick w/ the original totalHits per-segment
and in RandomSamplingFC do a first pass to sum up the global totalHits.
** With that I think no changes are needed to FacetsCollector?

About RandomSamplingFacetsCollector:

* I think we should fix the class jdoc's last Note as follows: Note: if you
use a counting {\@link Facets} implementation, you can fix the sampled counts
by calling

* Also, I think instead of correctFacetCounts we should call it
amortizeFacetCounts or something like that. We do not implement here the exact
facet counting method that was before.

* I see that you remove sampleRatio, and now the ratio is computed as
threshold/totalDocs but I think that's ... wrong? I.e. if threshold=10 and
totalHits = 1000, I'll still get only 10 documents. But this is not what
threshold means.
** I think we should have minSampleSize, below which we don't sample at all
(that's the _threshold_)
** sampleRatio (e.g. 1%) is used only if totalHits minSampleSize, and even
then, we make sure that we sample at least minSampleSize
** If we will have maxSampleSize as well, we can take that into account too,
but it's OK if we don't do this in this issue

* createSample seems to be memory-less -- i.e. it doesn't carry over the bin
residue to the next segment. Not sure if it's critical though, but it might
affect the total sample size. If you feel like getting closer to the optimum,
want to fix it? Otherwise, can you please drop a TODO?

* Also, do you want to test using WAH8DocIdSet instead of FixedBitSet for the
sampled docs? If not, could you please drop a TODO that we could use a more
efficient bitset impl since it's a sparse vector?

About the test:

* Could you remove the 's' from collectors in the test name?
* Could you move to numDocs being a random number -- something like
atLeast(8000)?
* I don't mean to nitpick but if you obtain an NRT reader, no need to commit()
:)
* Make the two collector instances take 100/10% of the numDocs when you fix it
* Maybe use random.nextInt(10) for the facets instead of alternating
sequentially?
* I don't understand how you know that numChildren=5 when you ask for the 10
top children. Isn't it possible that w/ some random seed the number of children
will be different?
** In fact, I think that the random collectors should be initialized w/ a
random seed that depends on the test? Currently they aren't and so always use
0xdeadbeef?
* You have some sops left at the end of the test

Facet sampling
--

Key: LUCENE-5476
URL: https://issues.apache.org/jira/browse/LUCENE-5476
Project: Lucene - Core
Issue Type: Improvement
Reporter: Rob Audenaerde
Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java

With LUCENE-5339 facet sampling disappeared.
When trying to display facet counts on large datasets (10M documents)
counting facets is rather expensive, as all the hits are collected and
processed.
Sampling greatly reduced this and thus provided a nice speedup. Could it be
brought back?

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


 [ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-5493:
---

Assignee: Robert Muir

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

Robert Muir created LUCENE-5493:
---

 Summary: Rename Sorter, NumericDocValuesSorter, and fix javadocs
 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


Its not clear to users that these are for this super-expert thing of 
pre-sorting the index. From the names and documentation they think they should 
use them instead of Sort/SortField.

These need to be renamed or, even better, the API fixed so they aren't public 
classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922280#comment-13922280
 ] 

Uwe Schindler commented on LUCENE-5493:
---

I agree. Intially when I read the mail on the mailing list I was confused about 
what the user was doing! I had no idea, that he tried to mix IndexSorter's APIs 
with a custom collector, which is likely to fail.

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922283#comment-13922283
 ] 

Robert Muir commented on LUCENE-5493:
-

If you look at the javadocs for all of lucene and then you read the description 
of these classes, you can see how the user was easily confused.

Because of the problems here, my first plan of attack will be to remove these 
classes from public view completely. If i cannot do that, I will rename them 
and add warnings.

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5476) Facet sampling

[
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922293#comment-13922293
]

Rob Audenaerde commented on LUCENE-5476:

Thanks Shai,

I have fixed the points you noted about the collector. I renamed the
sampleThreshold to sampleSize. It currently picks a samplingRatio that will
reduce the number of hits to the sampleSize, if the number of hits is greater.

I have a general question about your remarks about the test, besides fixing
the obvious (names, commit, sops). Is there a reason to add more randomness to
one test? I normally try to test one aspect in a unit test. And if I also want
to test some other aspect, like random document counts (to test the sampleratio
for example), I add more tests.

{quote}
Make the two collector instances take 100/10% of the numDocs when you fix it
{quote}
Sorry, I don't get what you mean by this.
{quote}
I don't understand how you know that numChildren=5 when you ask for the 10
top children. Isn't it possible that w/ some random seed the number of children
will be different?
In fact, I think that the random collectors should be initialized w/ a
random seed that depends on the test? Currently they aren't and so always use
0xdeadbeef?
{quote}
There will be 5 facet values (0, 2, 4, 6 and 8), as only the even documents (i
% 10) are hits. There is a REAL small chance that one of the five values will
be entirely missed when sampling. But is that {{0.8 (chance not to take a
value) ^ 2000 * 5 (any can be missing) ~ 10^-193}}, so that is probable not
going to happen :).

Facet sampling
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5476) Facet sampling


 [ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Audenaerde updated LUCENE-5476:
---

Attachment: LUCENE-5476.patch

 Facet sampling
 --

 Key: LUCENE-5476
 URL: https://issues.apache.org/jira/browse/LUCENE-5476
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Rob Audenaerde
 Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java


 With LUCENE-5339 facet sampling disappeared. 
 When trying to display facet counts on large datasets (10M documents) 
 counting facets is rather expensive, as all the hits are collected and 
 processed. 
 Sampling greatly reduced this and thus provided a nice speedup. Could it be 
 brought back?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5422) Postings lists deduplication

[
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922311#comment-13922311
]

Vishmi Money edited comment on LUCENE-5422 at 3/6/14 11:03 AM:
---

Dmitry Kan , Otis Gospodnetic ,
Thank you very much for your explanations and now I got a clear idea about the
two issues. As new documents are added segments are merged to the index but, if
some documents are deleted, we have to keep track on those using skip entries.
Meanwhile we have to preserve or improve the performance of the operation. That
is the area which is discussed in LUCENE-2082.
In LUCENE-5422, we want to make synonyms, exact/inexact terms point to a same
posting list also providing wildcard support. Main objective is to save space.
Meanwhile, we also have to avoid index bloating much as possible. LUCENE-5422
relates with LUCENE-2082 because anyway LUCENE-5422 has to deal with segment
merging. This is the idea I got and please let me know if I am wrong on
something.

Currently I am following LUCENE-4.7.0 documentation and also being familiar
with the source code and coding conventions. I also follow Michael McCandless's
Blog and read few posts related like, Visualizing Lucene's segment merges,
Building a new Lucene posting format etc. I also started reading LUCENE In
Action-second edition book but then I noticed that it is for LUCENE-3.0. As
LUCENE-4.0 has switched to a new pluggable codec architecture, I wonder whether
all the content of the book is relavent or not. Shall I proceed with the
reading or should I only have to look on documentation for LUCENE-4.0 or above?

was (Author: vishmi money):
Dmitry Kan, Otis Gospodnetic,
Thank you very much for your explanations and now I got a clear idea about the
two issues. As new documents are added segments are merged to the index but, if
some documents are deleted, we have to keep track on those using skip entries.
Meanwhile we have to preserve or improve the performance of the operation. That
is the area which is discussed in LUCENE-2082.
In LUCENE-5422, we want to make synonyms, exact/inexact terms point to a same
posting list also providing wildcard support. Main objective is to save space.
Meanwhile, we also have to avoid index bloating much as possible. LUCENE-5422
relates with LUCENE-2082 because anyway LUCENE-5422 has to deal with segment
merging. This is the idea I got and please let me know if I am wrong on
something.

Postings lists deduplication

Key: LUCENE-5422
URL: https://issues.apache.org/jira/browse/LUCENE-5422
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs, core/index
Reporter: Dmitry Kan
Labels: gsoc2014

The context:
http://markmail.org/thread/tywtrjjcfdbzww6f
Robert Muir and I have discussed what Robert eventually named postings
lists deduplication at Berlin Buzzwords 2013 conference.
The idea is to allow multiple terms to point to the same postings list to
save space. This can be achieved by new index codec implementation, but this
jira is open to other ideas as well.
The application / impact of this is positive for synonyms, exact / inexact
terms, leading wildcard support via storing reversed term etc.
For example, at the moment, when supporting exact (unstemmed) and inexact
(stemmed)
searches, we store both unstemmed and stemmed variant of a word form and
that leads to index bloating. That is why we had to remove the leading
wildcard support via reversing a token on index and query time because of
the same index size considerations.
Comment from Mike McCandless:
Neat idea!
Would this idea allow a single term to point to (the union of) N other
posting lists? It seems like that's necessary e.g. to handle the
exact/inexact case.
And then, to produce the Docs/AndPositionsEnum you'd need to do the
merge sort across those N posting lists?
Such a thing might also be do-able as runtime only wrapper around the
postings API (FieldsProducer), if you could at runtime do the reverse
expansion (e.g. stem - all of its surface forms).
Comment from Robert Muir:
I think the exact/inexact is trickier (detecting it would be the

[jira] [Commented] (LUCENE-5422) Postings lists deduplication

[
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922311#comment-13922311
]

Vishmi Money commented on LUCENE-5422:
--

Dmitry Kan, Otis Gospodnetic,
Thank you very much for your explanations and now I got a clear idea about the
two issues. As new documents are added segments are merged to the index but, if
some documents are deleted, we have to keep track on those using skip entries.
Meanwhile we have to preserve or improve the performance of the operation. That
is the area which is discussed in LUCENE-2082.
In LUCENE-5422, we want to make synonyms, exact/inexact terms point to a same
posting list also providing wildcard support. Main objective is to save space.
Meanwhile, we also have to avoid index bloating much as possible. LUCENE-5422
relates with LUCENE-2082 because anyway LUCENE-5422 has to deal with segment
merging. This is the idea I got and please let me know if I am wrong on
something.

Postings lists deduplication

Key: LUCENE-5422
URL: https://issues.apache.org/jira/browse/LUCENE-5422
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs, core/index
Reporter: Dmitry Kan
Labels: gsoc2014

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5422) Postings lists deduplication

[
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922311#comment-13922311
]

Vishmi Money edited comment on LUCENE-5422 at 3/6/14 11:06 AM:
---

Currently I am following LUCENE- 4.7.0 documentation and also being familiar
with the source code and coding conventions. I also follow Michael McCandless's
Blog and read few posts related like, Visualizing Lucene's segment merges,
Building a new Lucene posting format etc. I also started reading LUCENE In
Action-second edition book but then I noticed that it is for LUCENE- 3.0. As
LUCENE- 4.0 has switched to a new pluggable codec architecture, I wonder
whether all the content of the book is relavent or not. Shall I proceed with
the reading or should I only have to look on documentation for LUCENE- 4.0 or
above?

was (Author: vishmi money):
Dmitry Kan , Otis Gospodnetic ,
Thank you very much for your explanations and now I got a clear idea about the
two issues. As new documents are added segments are merged to the index but, if
some documents are deleted, we have to keep track on those using skip entries.
Meanwhile we have to preserve or improve the performance of the operation. That
is the area which is discussed in LUCENE-2082.
In LUCENE-5422, we want to make synonyms, exact/inexact terms point to a same
posting list also providing wildcard support. Main objective is to save space.
Meanwhile, we also have to avoid index bloating much as possible. LUCENE-5422
relates with LUCENE-2082 because anyway LUCENE-5422 has to deal with segment
merging. This is the idea I got and please let me know if I am wrong on
something.

Postings lists deduplication

Key: LUCENE-5422
URL: https://issues.apache.org/jira/browse/LUCENE-5422
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs, core/index
Reporter: Dmitry Kan
Labels: gsoc2014

[jira] [Comment Edited] (LUCENE-5422) Postings lists deduplication

[
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922311#comment-13922311
]

Vishmi Money edited comment on LUCENE-5422 at 3/6/14 11:16 AM:
---

[~dmitry_key], [~otis],
Thank you very much for your explanations and now I got a clear idea about the
two issues. As new documents are added segments are merged to the index but, if
some documents are deleted, we have to keep track on those using skip entries.
Meanwhile we have to preserve or improve the performance of the operation. That
is the area which is discussed in LUCENE-2082.
In LUCENE-5422, we want to make synonyms, exact/inexact terms point to a same
posting list also providing wildcard support. Main objective is to save space.
Meanwhile, we also have to avoid index bloating much as possible. LUCENE-5422
relates with LUCENE-2082 because anyway LUCENE-5422 has to deal with segment
merging. This is the idea I got and please let me know if I am wrong on
something.

Currently I am following LUCENE- 4.7.0 documentation and also being familiar
with the source code and coding conventions. I also follow Michael McCandless's
Blog and read few posts related like, Visualizing Lucene's segment merges,
Building a new Lucene posting format etc. I also started reading LUCENE In
Action-second edition book but then I noticed that it is for LUCENE- 3.0. As
LUCENE- 4.0 has switched to a new pluggable codec architecture, I wonder
whether all the content of the book is relavent or not. Shall I proceed with
the reading or should I only have to look on documentation for LUCENE- 4.0 or
above?

was (Author: vishmi money):
Dmitry Kan , Otis Gospodnetic ,
Thank you very much for your explanations and now I got a clear idea about the
two issues. As new documents are added segments are merged to the index but, if
some documents are deleted, we have to keep track on those using skip entries.
Meanwhile we have to preserve or improve the performance of the operation. That
is the area which is discussed in LUCENE-2082.
In LUCENE-5422, we want to make synonyms, exact/inexact terms point to a same
posting list also providing wildcard support. Main objective is to save space.
Meanwhile, we also have to avoid index bloating much as possible. LUCENE-5422
relates with LUCENE-2082 because anyway LUCENE-5422 has to deal with segment
merging. This is the idea I got and please let me know if I am wrong on
something.

Currently I am following LUCENE- 4.7.0 documentation and also being familiar
with the source code and coding conventions. I also follow Michael McCandless's
Blog and read few posts related like, Visualizing Lucene's segment merges,
Building a new Lucene posting format etc. I also started reading LUCENE In
Action-second edition book but then I noticed that it is for LUCENE- 3.0. As
LUCENE- 4.0 has switched to a new pluggable codec architecture, I wonder
whether all the content of the book is relavent or not. Shall I proceed with
the reading or should I only have to look on documentation for LUCENE- 4.0 or
above?

Postings lists deduplication

Key: LUCENE-5422
URL: https://issues.apache.org/jira/browse/LUCENE-5422
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs, core/index
Reporter: Dmitry Kan
Labels: gsoc2014

Suggestions about writing / extending QueryParsers

2014-03-06 Thread Tommaso Teofili

Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've
never really looked into that code too much, while I'm doing that now, I'm
wondering if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing
grammar / class?

Thanks in advance,
Tommaso

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922335#comment-13922335
 ] 

Robert Muir commented on LUCENE-5493:
-

Upon review of the APIs, I think ideal to the user is to remove these current 
sorter/comparators so that when you want to use sorting mergepolicy, you just 
pass it a normal org.apache.lucene.search.Sort. 

I know it seems a little crazy, but IMO the logic is duplicated. So someone 
should just be doing:
{code}
Sort sort = new Sort(new SortField(field1, SortField.Type.DOUBLE), new 
SortField());
iwc.setMergePolicy(mp, new SortingMergePolicy(sort));
{code}

This would let people be able to sort in reverse, by doubles/floats, by a 
combination of fields, expressions, whatever. And would deconfuse the API.

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922354#comment-13922354
 ] 

Michael McCandless commented on LUCENE-5493:


That would be great, if we could just use Sort here!

+1 to deconfuse the API.

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922358#comment-13922358
 ] 

Robert Muir commented on LUCENE-5493:
-

Another reason is that IndexSearcher knows about Sort already. so this would 
give us a path if we want a better integration here in the future.

If we did it right, no additional info/apis from the user would be needed other 
than setting the mergePolicy at index-time: indexSearcher.search(query, filter, 
int, sort) for example could do the right thing for a segment, if the passed in 
query-time sort is covered by the sort order of the index. But thats for the 
future.


 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5476) Facet sampling

2014-03-06 Thread Gilad Barkai (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922374#comment-13922374
 ] 

Gilad Barkai commented on LUCENE-5476:
--

Hi Rob, patch looks great.

A few comments:
* Some imports are not used (o.a.l.u.Bits, o.a.l.s.Collector  o.a.l.s.DocIdSet)
* Perhaps the parameters initialized in the RandomSamplingFacetsCollector c'tor 
could be made {{final}}
* XORShift64Random.XORShift64Random() (default c'tor) is never used. Perhaps it 
was needed for usability when this was thought to be a core utility and was 
left by mistake? Should it be called somewhere?
* {{getMatchingDocs()}}
** when {{!sampleNeeded()}} there's a call to {{super.getMatchingDocs()}}, this 
may be redundant method call as 5 lines above we call it, and the code always 
compute the {{totalHits}} first. Perhaps the original matching docs could be 
stored as a member? This would also help for some implementations of correcting 
the sampled facet results.
** {{totalHits}} is redundantly computed again in line 147-152 
* {{needsSampling()}} could perhaps be protected, allowing other criteria for 
sampling to be added
* {{createSample()}}
** {{randomIndex}} is initialized to {{0}}, effectively making the first 
document of every segment's bin to be selected as the representative of that 
bin, neglecting the rest of the bin (regardless of the seed). So if a bin is 
the size of a 1000 documents, than there are 999 documents that regardless of 
the seed would always be neglected. It may be better so initialize as 
{{randomIndex = random.nextInt(binsize)}} as it happens for the 2nd and on 
bins. 
** While creating a new {{MatchingDocs}} with the sampled set, the original 
{{totalHits}} and original {{scores}} are used. I'm not 100% sure the first is 
an issue, but any facet accumulation which would rely on document scores would 
be hit by the second as the {{scores}} (at least by javadocs) are defined as 
non-sparse.


 Facet sampling
 --

 Key: LUCENE-5476
 URL: https://issues.apache.org/jira/browse/LUCENE-5476
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Rob Audenaerde
 Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java


 With LUCENE-5339 facet sampling disappeared. 
 When trying to display facet counts on large datasets (10M documents) 
 counting facets is rather expensive, as all the hits are collected and 
 processed. 
 Sampling greatly reduced this and thus provided a nice speedup. Could it be 
 brought back?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5476) Facet sampling

[
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922411#comment-13922411
]

Rob Audenaerde commented on LUCENE-5476:

Thanks,

{quote}
when !sampleNeeded() there's a call to super.getMatchingDocs(), this may be
redundant method call as 5 lines above we call it, and the code always compute
the totalHits first. Perhaps the original matching docs could be stored as a
member? This would also help for some implementations of correcting the sampled
facet results.
totalHits is redundantly computed again in line 147-152
{quote}
How could I have missed this... Must take a break I think.

{{createSample}}
I always take the first document, as I did not implement carrying-over of the
segments. If I would pick a random index and this index would be greater than
the number of document in the segment, the segment would not be sampled. This
results is 'too few' sampled documents. Taking the first always might result in
'too many' but that gave a better overall distribution and average.
I think your argument about not-so-random documents and the fact that
carry-over should not be that hard, I should implement carry over anyway.

Facet sampling
--

Key: LUCENE-5476
URL: https://issues.apache.org/jira/browse/LUCENE-5476
Project: Lucene - Core
Issue Type: Improvement
Reporter: Rob Audenaerde
Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5265) Add backward compatibility tests to JavaBinCodec's format.

2014-03-06 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-5265:


Attachment: SOLR-5265.patch

An attempt at tackling this Jira -
1. The test will ensure that if we ever change the byte values of existing 
variables in JavaBinCodec or if we write a type differently it will fail
2. If new types are added to JavaBinCodec the test case and the binary file 
will have to be updated again

There are a couple of nocommits but I wanted to know if I am on the right track.

 Add backward compatibility tests to JavaBinCodec's format.
 --

 Key: SOLR-5265
 URL: https://issues.apache.org/jira/browse/SOLR-5265
 Project: Solr
  Issue Type: Test
Reporter: Adrien Grand
Priority: Blocker
 Fix For: 4.7

 Attachments: SOLR-5265.patch


 Since Solr guarantees backward compatibility of JavaBinCodec's format between 
 releases, we should have tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Suggestions about writing / extending QueryParsers

2014-03-06 Thread Allison, Timothy B.

Hi Tommaso,

  It will depend on how different your target syntax will be.  If you extend 
the classic parser (or, QueryParserBase), there is a fair amount of overhead 
and extras that you might not want or need.  On the other hand, the query 
syntax and the methods will be familiar to the Lucene community, and there is a 
large number of test cases already built for you.  On the third hand, if you 
need not modify the low level parsing stuff, you'll have to be familiar with 
javacc.

  There's the flexible family that should allow for easy modifications, and 
the xml family could offer an easy interface between a custom lexer and a 
parser.   The SimpleQueryParser offers a model of building something fairly 
simple and yet very elegant from scratch.

  In deciding where to start, another consideration might include how easy it 
will be to integrate at the Solr level.  Make sure to include field-based hooks 
for processing multiterms, prefix and range queries.

  For LUCENE-5205, I eventually chose to subclass QueryParserBase, and I had to 
override  a fair amount of code because every terminal had to be a SpanQuery - 
most of the queryparser infrastructure is built for traditional queries.

  So, what features do you want to add for mlt?  What capabilities do you need?

  Cheers,

  Tim



From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Sent: Thursday, March 06, 2014 6:23 AM
To: dev@lucene.apache.org
Subject: Suggestions about writing / extending QueryParsers

Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've never 
really looked into that code too much, while I'm doing that now, I'm wondering 
if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing 
grammar / class?

Thanks in advance,
Tommaso

[jira] [Updated] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5493:


Attachment: LUCENE-5493-poc.patch

Here is a very simple proof of concept patch.

I made a SortSorter (extends existing Sorter api and takes o.a.l.s.Sort), 
removed NumericDocValuesSorter, and replaced it with this more general 
Sort-Sorter in all tests and they pass.

So my next step would be to remove public apis like Sorter/DocMap and make that 
all internal. SortingMP and EarlyTerminatingSortingCollector would just take 
Sort directly.

BlockJoinSorter needs to be cutover to a regular comparator. And in suggest/ 
there is a custom comparator... that i think doesnt need to be custom and is 
just sorting on a dv field.

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5265) Add backward compatibility tests to JavaBinCodec's format.

2014-03-06 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922452#comment-13922452
 ] 

Noble Paul commented on SOLR-5265:
--

don't use toString() to compare test actual values

don't use FileInputstream, Use 
getClass().getResourceAsStream(/solrj/updateReq_4_5.bin) as given in 
TestUpdateRequestCodec

add the rest of the types

We also need a forward compatibility tests

 Add backward compatibility tests to JavaBinCodec's format.
 --

 Key: SOLR-5265
 URL: https://issues.apache.org/jira/browse/SOLR-5265
 Project: Solr
  Issue Type: Test
Reporter: Adrien Grand
Priority: Blocker
 Fix For: 4.7

 Attachments: SOLR-5265.patch


 Since Solr guarantees backward compatibility of JavaBinCodec's format between 
 releases, we should have tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922468#comment-13922468
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574867 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574867 ]

LUCENE-5493: commit current state

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922467#comment-13922467
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574866 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574866 ]

LUCENE-5493: create branch

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922507#comment-13922507
 ] 

Adrien Grand commented on LUCENE-5493:
--

+1 on making those classes wrap a `Sort`. I had started working on it for 
LUCENE-5314 but never got a chance to get a patch ready.

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5820) OverseerCollectionProcessor#lookupReplicas has a timeout that is too short and a bad error message on timeout.

Mark Miller created SOLR-5820:
-

 Summary: OverseerCollectionProcessor#lookupReplicas has a timeout 
that is too short and a bad error message on timeout.
 Key: SOLR-5820
 URL: https://issues.apache.org/jira/browse/SOLR-5820
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5820) OverseerCollectionProcessor#lookupReplicas has a timeout that is too short and a bad error message on timeout.


[ 
https://issues.apache.org/jira/browse/SOLR-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922524#comment-13922524
 ] 

Mark Miller commented on SOLR-5820:
---

Test fails in creating collections led me to this.

 OverseerCollectionProcessor#lookupReplicas has a timeout that is too short 
 and a bad error message on timeout.
 --

 Key: SOLR-5820
 URL: https://issues.apache.org/jira/browse/SOLR-5820
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5820) OverseerCollectionProcessor#lookupReplicas has a timeout that is too short and a bad error message on timeout.

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922533#comment-13922533
 ] 

Mark Miller commented on SOLR-5820:
---

This also ends up being a fairly ugly fail - the user basically ends up seeing 
that creating the collection failed because it already exists, because it 
retries.

 OverseerCollectionProcessor#lookupReplicas has a timeout that is too short 
 and a bad error message on timeout.
 --

 Key: SOLR-5820
 URL: https://issues.apache.org/jira/browse/SOLR-5820
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5820) OverseerCollectionProcessor#lookupReplicas has a timeout that is too short and a bad error message on timeout.


[ 
https://issues.apache.org/jira/browse/SOLR-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922540#comment-13922540
 ] 

ASF subversion and git services commented on SOLR-5820:
---

Commit 1574883 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1574883 ]

SOLR-5820: OverseerCollectionProcessor#lookupReplicas has a timeout that is too 
short and a bad error message on timeout.

 OverseerCollectionProcessor#lookupReplicas has a timeout that is too short 
 and a bad error message on timeout.
 --

 Key: SOLR-5820
 URL: https://issues.apache.org/jira/browse/SOLR-5820
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Suggestions about writing / extending QueryParsers

2014-03-06 Thread Upayavira

Tommaso,



Do say more about what you're thinking of. I'm currently getting my dev
environment up to look into enhancing the MoreLikeThisHandler to be
able handle function query boosts. This should be eminently possible
from my initial research. However, if you're thinking of something more
powerful, perhaps we can work together.



Upayavira





On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:

Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries;
I've never really looked into that code too much, while I'm doing that
now, I'm wondering if anyone has suggestions on how to start with such
a topic.
Should I write a new grammar for that ? Or can I just extend an
existing grammar / class?

Thanks in advance,
Tommaso

[jira] [Commented] (SOLR-5820) OverseerCollectionProcessor#lookupReplicas has a timeout that is too short and a bad error message on timeout.

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922544#comment-13922544
 ] 

ASF subversion and git services commented on SOLR-5820:
---

Commit 1574884 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1574884 ]

SOLR-5820: OverseerCollectionProcessor#lookupReplicas has a timeout that is too 
short and a bad error message on timeout.

 OverseerCollectionProcessor#lookupReplicas has a timeout that is too short 
 and a bad error message on timeout.
 --

 Key: SOLR-5820
 URL: https://issues.apache.org/jira/browse/SOLR-5820
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5422) Postings lists deduplication

2014-03-06 Thread Otis Gospodnetic (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922553#comment-13922553
]

Otis Gospodnetic commented on LUCENE-5422:
--

Maybe [~mikemccand] can comment, but I think think you are right as far as
Codecs part of Lucene and LIA are concerned.

Postings lists deduplication

Key: LUCENE-5422
URL: https://issues.apache.org/jira/browse/LUCENE-5422
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs, core/index
Reporter: Dmitry Kan
Labels: gsoc2014

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922620#comment-13922620
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574909 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574909 ]

LUCENE-5493: make BlockJoinSorter a ComparatorSource taking parent/child Sort

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5422) Postings lists deduplication

2014-03-06 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922627#comment-13922627
]

Michael McCandless commented on LUCENE-5422:

I think reading blogs, javadocs, CHANGES entries, are all good ways to come up
to speed on the recent changes in Lucene.

And, yes, LUCENE-2082 is about more efficient merging, by appending raw
postings bytes instead of decode + re-encode that's done today. It's analogous
to how Lucene used to fully decode and then re-encode each Document (stored
fields) during merging, but today we just do bulk copying of bytes when
possible (same for term vectors).

I think this issue needs better scoping / maybe a clearer use case, to
understand exactly when the postings list deduping should kick in. And if this
incurs a search-time cost (e.g. a merge sort of N postings lists to make it
look like a single posting list) that's an added cost that may be the wrong
(smaller index and slower searching) tradeoff in most cases.

Postings lists deduplication

Key: LUCENE-5422
URL: https://issues.apache.org/jira/browse/LUCENE-5422
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs, core/index
Reporter: Dmitry Kan
Labels: gsoc2014

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922641#comment-13922641
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574918 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574918 ]

LUCENE-5493: hide Sorter, SortSorter, fix tests, change suggest to use public 
Sort API, cut over collector to take Sort

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5492) IndexFileDeleter AssertionError in presence of *_upgraded.si files

2014-03-06 Thread Tim Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922646#comment-13922646
 ] 

Tim Smith commented on LUCENE-5492:
---

Narrowing it down

definitely seeing a reference count issue

this only seems to occur when using DirectoryReader.open(IndexWriter ...) 
methods

for one particular commit point segments_4, i see the following refcount 
behavior:
* incref segments_4
** incref _0_upgraded.si refcount=3
** decref _0_upgraded.si refcount=2
* incref segments_4
** NOTE: _0_upgraded.si not incref'd this time
* ...
* delete segments_4
** decref _0_upgraded.si ERROR








 IndexFileDeleter AssertionError in presence of *_upgraded.si files
 --

 Key: LUCENE-5492
 URL: https://issues.apache.org/jira/browse/LUCENE-5492
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Tim Smith

 When calling IndexWriter.deleteUnusedFiles against an index that contains 3.x 
 segments, i am seeing the following exception:
 {code}
 java.lang.AssertionError: failAndDumpStackJunitStatment: RefCount is 0 
 pre-decrement for file _0_upgraded.si
 at 
 org.apache.lucene.index.IndexFileDeleter$RefCount.DecRef(IndexFileDeleter.java:630)
 at 
 org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:514)
 at 
 org.apache.lucene.index.IndexFileDeleter.deleteCommits(IndexFileDeleter.java:286)
 at 
 org.apache.lucene.index.IndexFileDeleter.revisitPolicy(IndexFileDeleter.java:393)
 at 
 org.apache.lucene.index.IndexWriter.deleteUnusedFiles(IndexWriter.java:4617)
 {code}
 I believe this is caused by IndexFileDeleter not being aware of the Lucene3x 
 Segment Infos Format (notably the _upgraded.si files created to upgrade an 
 old index)
 This is new in 4.7 and did not occur in 4.6.1
 Still trying to track down a workaround/fix



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922656#comment-13922656
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574925 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574925 ]

LUCENE-5493: remove dead code

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922658#comment-13922658
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574926 from [~mikemccand] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574926 ]

LUCENE-5493: small clean ups

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5637) Per-request cache statistics

2014-03-06 Thread Shikhar Bhushan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated SOLR-5637:
--

Fix Version/s: (was: 4.7)

 Per-request cache statistics
 

 Key: SOLR-5637
 URL: https://issues.apache.org/jira/browse/SOLR-5637
 Project: Solr
  Issue Type: New Feature
Reporter: Shikhar Bhushan
Priority: Minor
 Attachments: SOLR-5367.patch, SOLR-5367.patch


 We have found it very useful to have information on the number of cache hits 
 and misses for key Solr caches (filterCache, documentCache, etc.) at the 
 request level.
 This is currently implemented in our codebase using custom {{SolrCache}} 
 implementations.
 I am working on moving to maintaining stats in the {{SolrRequestInfo}} 
 thread-local, and adding hooks in get() methods of SolrCache implementations. 
 This will be glued up using the {{DebugComponent}} and can be requested using 
 a debug.cache parameter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5637) Per-request cache statistics

2014-03-06 Thread Shikhar Bhushan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated SOLR-5637:
--

Attachment: SOLR-5637.patch

updated patch against lucene_solr_4_7 branch

 Per-request cache statistics
 

 Key: SOLR-5637
 URL: https://issues.apache.org/jira/browse/SOLR-5637
 Project: Solr
  Issue Type: New Feature
Reporter: Shikhar Bhushan
Priority: Minor
 Attachments: SOLR-5367.patch, SOLR-5367.patch, SOLR-5637.patch


 We have found it very useful to have information on the number of cache hits 
 and misses for key Solr caches (filterCache, documentCache, etc.) at the 
 request level.
 This is currently implemented in our codebase using custom {{SolrCache}} 
 implementations.
 I am working on moving to maintaining stats in the {{SolrRequestInfo}} 
 thread-local, and adding hooks in get() methods of SolrCache implementations. 
 This will be glued up using the {{DebugComponent}} and can be requested using 
 a debug.cache parameter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5492) IndexFileDeleter AssertionError in presence of *_upgraded.si files

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922663#comment-13922663
 ] 

Michael McCandless commented on LUCENE-5492:


Hmm I wonder if this is related to LUCENE-5434; we added incRef/decRef for NRT 
readers pulled from IndexWriter.  If you revert that change locally do you 
still see this happening?

 IndexFileDeleter AssertionError in presence of *_upgraded.si files
 --

 Key: LUCENE-5492
 URL: https://issues.apache.org/jira/browse/LUCENE-5492
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Tim Smith

 When calling IndexWriter.deleteUnusedFiles against an index that contains 3.x 
 segments, i am seeing the following exception:
 {code}
 java.lang.AssertionError: failAndDumpStackJunitStatment: RefCount is 0 
 pre-decrement for file _0_upgraded.si
 at 
 org.apache.lucene.index.IndexFileDeleter$RefCount.DecRef(IndexFileDeleter.java:630)
 at 
 org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:514)
 at 
 org.apache.lucene.index.IndexFileDeleter.deleteCommits(IndexFileDeleter.java:286)
 at 
 org.apache.lucene.index.IndexFileDeleter.revisitPolicy(IndexFileDeleter.java:393)
 at 
 org.apache.lucene.index.IndexWriter.deleteUnusedFiles(IndexWriter.java:4617)
 {code}
 I believe this is caused by IndexFileDeleter not being aware of the Lucene3x 
 Segment Infos Format (notably the _upgraded.si files created to upgrade an 
 old index)
 This is new in 4.7 and did not occur in 4.6.1
 Still trying to track down a workaround/fix



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Problems installing Pylucene on Ubuntu 12.04

2014-03-06 Thread Andi Vajda



On Thu, 6 Mar 2014, Ritzschke, Uwe wrote:


Hello,

I'm facing problems with installing Pylucene on an Ubuntu 12.04 Server (32bit). 
Perhaps someone can give me some helpful advice?
I've followed the official installation instructions [1]. It seems that building and installing JCC 
works fine. Also, running make to build Pylucene seems to succeed. But if I run 
make test, I get the errors attached below.


It looks like there is a left-over 'import pdb; pdb.set_trace()' statement 
in the test_PythonDirectory.py test, at line 260.

Please, remove it and re-run the tests.

Thanks !

Andi..




Thank you in advance!
Uwe

1: http://lucene.apache.org/pylucene/install.html



Output of make test (shortened):

[...]

==
ERROR: test_FieldEnumeration (__main__.PythonDirectoryTests)
--
Traceback (most recent call last):
 File /root/pylucene-4.6.1-1/test/test_PyLucene.py, line 236, in 
test_FieldEnumeration
   self.test_indexDocument()
 File /root/pylucene-4.6.1-1/test/test_PyLucene.py, line 84, in 
test_indexDocument
   self.closeStore(store, writer)
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File /usr/lib/python2.7/bdb.py, line 48, in trace_dispatch
   return self.dispatch_line(frame)
 File /usr/lib/python2.7/bdb.py, line 67, in dispatch_line
   if self.quitting: raise BdbQuit
BdbQuit

==
ERROR: test_IncrementalLoop (__main__.PythonDirectoryTests)
--
Traceback (most recent call last):
 File test/test_PythonDirectory.py, line 268, in test_IncrementalLoop
   self.test_indexDocument()
 File /root/pylucene-4.6.1-1/test/test_PyLucene.py, line 84, in 
test_indexDocument
   self.closeStore(store, writer)
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File /usr/lib/python2.7/bdb.py, line 48, in trace_dispatch
   return self.dispatch_line(frame)
 File /usr/lib/python2.7/bdb.py, line 67, in dispatch_line
   if self.quitting: raise BdbQuit
BdbQuit

==
ERROR: test_getFieldInfos (__main__.PythonDirectoryTests)
--
Traceback (most recent call last):
 File /root/pylucene-4.6.1-1/test/test_PyLucene.py, line 282, in 
test_getFieldInfos
   self.test_indexDocument()
 File /root/pylucene-4.6.1-1/test/test_PyLucene.py, line 84, in 
test_indexDocument
   self.closeStore(store, writer)
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File /usr/lib/python2.7/bdb.py, line 48, in trace_dispatch
   return self.dispatch_line(frame)
 File /usr/lib/python2.7/bdb.py, line 67, in dispatch_line
   if self.quitting: raise BdbQuit
BdbQuit

==
ERROR: test_indexDocument (__main__.PythonDirectoryTests)
--
Traceback (most recent call last):
 File /root/pylucene-4.6.1-1/test/test_PyLucene.py, line 84, in 
test_indexDocument
   self.closeStore(store, writer)
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File /usr/lib/python2.7/bdb.py, line 48, in trace_dispatch
   return self.dispatch_line(frame)
 File /usr/lib/python2.7/bdb.py, line 67, in dispatch_line
   if self.quitting: raise BdbQuit
BdbQuit

==
ERROR: test_indexDocumentWithText (__main__.PythonDirectoryTests)
--
Traceback (most recent call last):
 File /root/pylucene-4.6.1-1/test/test_PyLucene.py, line 112, in 
test_indexDocumentWithText
   self.closeStore(store, writer)
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File test/test_PythonDirectory.py, line 255, in closeStore
   for arg in args:
 File /usr/lib/python2.7/bdb.py, line 48, in trace_dispatch
   return self.dispatch_line(frame)
 File /usr/lib/python2.7/bdb.py, line 67, in dispatch_line
   if self.quitting: raise BdbQuit
BdbQuit

==
ERROR: test_indexDocumentWithUnicodeText (__main__.PythonDirectoryTests)
--
Traceback (most recent call last):
 File /root/pylucene-4.6.1-1/test/test_PyLucene.py, line 143, in 
test_indexDocumentWithUnicodeText

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922667#comment-13922667
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574928 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574928 ]

LUCENE-5493: minor cleanups/opto

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5768) Add a distrib.singlePass parameter to make GET_FIELDS phase fetch all fields and skip EXECUTE_QUERY

2014-03-06 Thread Shikhar Bhushan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922669#comment-13922669
 ] 

Shikhar Bhushan commented on SOLR-5768:
---

seems like the JIRA title has it the other way round :)

 Add a distrib.singlePass parameter to make GET_FIELDS phase fetch all fields 
 and skip EXECUTE_QUERY
 ---

 Key: SOLR-5768
 URL: https://issues.apache.org/jira/browse/SOLR-5768
 Project: Solr
  Issue Type: Improvement
Reporter: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.8, 5.0


 Suggested by Yonik on solr-user:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg95045.html
 {quote}
 Although it seems like it should be relatively simple to make it work
 with other fields as well, by passing down the complete fl requested
 if some optional parameter is set (distrib.singlePass?)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5494) ArrayIndexOutOfBounds - WordBreakSolrSpellChecker.java:266

2014-03-06 Thread Mark Peck (JIRA)

Mark Peck  created LUCENE-5494:
--

 Summary: ArrayIndexOutOfBounds - WordBreakSolrSpellChecker.java:266
 Key: LUCENE-5494
 URL: https://issues.apache.org/jira/browse/LUCENE-5494
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
 Environment: SOlrNet, Uri interface
Reporter: Mark Peck 
Priority: Minor


When running the following query:

{code}
http://localhost:8983/solr/search/select?q=(%22active%2Bhuman%2Bcox-2%22+OR+(%22active%22+AND+%22human%22+AND+%22cox-2%22))spellcheck=true
{code}

We get the following error output:

{code:xml}
lst name=error
str name=msg9/str
str name=trace
java.lang.ArrayIndexOutOfBoundsException: 9 at 
org.apache.solr.spelling.WordBreakSolrSpellChecker.getSuggestions(WordBreakSolrSpellChecker.java:266)
 at 
org.apache.solr.spelling.ConjunctionSolrSpellChecker.getSuggestions(ConjunctionSolrSpellChecker.java:120)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) 
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) 
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
at org.eclipse.jetty.server.Server.handle(Server.java:365) at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
 at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
at java.lang.Thread.run(Unknown Source)
/str
int name=code500/int
/lst
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-03-06 Thread Brett Lucey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922686#comment-13922686
 ] 

Brett Lucey commented on SOLR-2894:
---

Elran - A facet limit of -1 in distributed pivot facets is not a use case we 
use in our environment, but we did go ahead and make the fixes in order to 
support the community.  I've tested the changes locally on a box with success 
and added unit tests around it, but we have not yet deployed those changes to a 
production cluster.  The exception you were seeing was directly related to the 
facet limit being negative, and that has been fixed in the patch I uploaded 
yesterday.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.7

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5628) Cloud test harness manifesting reproducible failures in TestDistribDocBasedVersion

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922687#comment-13922687
 ] 

ASF subversion and git services commented on SOLR-5628:
---

Commit 1574941 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1574941 ]

SOLR-5628: work arround for this test to avoid whatever bug is in the cloud 
test framework

 Cloud test harness manifesting reproducible failures in 
 TestDistribDocBasedVersion
 --

 Key: SOLR-5628
 URL: https://issues.apache.org/jira/browse/SOLR-5628
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 Jenkins uncovered a test seed that causes a reproducible IndexWriter 
 assertion failure in TestDistribDocBasedVersion on the 4x branch.  
 McCandless helped dig in and believe that something in the way the solr test 
 framework is setup is causing the test to delete the index dirs before the 
 IndexWriter is being closed.  Meanwhile, it appears that recent changes to 4x 
 have caused the nature of the failure to change, so that now -- in addition 
 to the IndexWriter assertion failure -- the test cleanup also stalls out and 
 the test runner has to terminate some stalled threads.
 details to following in comment, but here's the reproduce line...
 {noformat}
 ant test  -Dtestcase=TestDistribDocBasedVersion -Dtests.seed=791402573DC76F3C 
 -Dtests.multiplier=3 -Dtests.slow=true
 -Dtests.locale=ar_IQ -Dtests.timezone=Antarctica/Rothera 
 -Dtests.file.encoding=US-ASCII
 {noformat}
 And the mail thread regarding this...
 https://mail-archives.apache.org/mod_mbox/lucene-dev/201401.mbox/%3Calpine.DEB.2.02.1401100930260.20275@frisbee%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5494) ArrayIndexOutOfBounds - WordBreakSolrSpellChecker.java:266

2014-03-06 Thread Mark Peck (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Peck  updated LUCENE-5494:
---

Description: 
When running the following query:

{code}
http://localhost:8983/solr/search/select?q=(%22active%2Bhuman%2Bcox-2%22+OR+(%22active%22+AND+%22human%22+AND+%22cox-2%22))spellcheck=true
{code}

We get the following error output:

{code:xml}
lst name=error
str name=msg9/str
str name=trace
java.lang.ArrayIndexOutOfBoundsException: 9 at 
org.apache.solr.spelling.WordBreakSolrSpellChecker.getSuggestions(WordBreakSolrSpellChecker.java:266)
 at 
org.apache.solr.spelling.ConjunctionSolrSpellChecker.getSuggestions(ConjunctionSolrSpellChecker.java:120)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) 
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) 
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
at org.eclipse.jetty.server.Server.handle(Server.java:365) at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
 at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
at java.lang.Thread.run(Unknown Source)
/str
int name=code500/int
/lst
{code}

(!) We have assertain this only happens when the '-2' as added to the search 
term.

  was:
When running the following query:

{code}
http://localhost:8983/solr/search/select?q=(%22active%2Bhuman%2Bcox-2%22+OR+(%22active%22+AND+%22human%22+AND+%22cox-2%22))spellcheck=true
{code}

We get the following error output:

{code:xml}
lst name=error
str name=msg9/str
str name=trace
java.lang.ArrayIndexOutOfBoundsException: 9 at 
org.apache.solr.spelling.WordBreakSolrSpellChecker.getSuggestions(WordBreakSolrSpellChecker.java:266)
 at 
org.apache.solr.spelling.ConjunctionSolrSpellChecker.getSuggestions(ConjunctionSolrSpellChecker.java:120)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) 
at

[jira] [Updated] (LUCENE-5494) ArrayIndexOutOfBounds - WordBreakSolrSpellChecker.java:266

2014-03-06 Thread Mark Peck (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Peck  updated LUCENE-5494:
---

Description: 
When running the following query:

{code}
http://localhost:8983/solr/search/select?q=(%22active%2Bhuman%2Bcox-2%22+OR+(%22active%22+AND+%22human%22+AND+%22cox-2%22))spellcheck=true
{code}

We get the following error output:

{code:xml}
lst name=error
str name=msg9/str
str name=trace
java.lang.ArrayIndexOutOfBoundsException: 9 at 
org.apache.solr.spelling.WordBreakSolrSpellChecker.getSuggestions(WordBreakSolrSpellChecker.java:266)
 at 
org.apache.solr.spelling.ConjunctionSolrSpellChecker.getSuggestions(ConjunctionSolrSpellChecker.java:120)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) 
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) 
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
at org.eclipse.jetty.server.Server.handle(Server.java:365) at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
 at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
at java.lang.Thread.run(Unknown Source)
/str
int name=code500/int
/lst
{code}

(!) We have ascertained this only happens when the '-2' as added to the search 
term.

  was:
When running the following query:

{code}
http://localhost:8983/solr/search/select?q=(%22active%2Bhuman%2Bcox-2%22+OR+(%22active%22+AND+%22human%22+AND+%22cox-2%22))spellcheck=true
{code}

We get the following error output:

{code:xml}
lst name=error
str name=msg9/str
str name=trace
java.lang.ArrayIndexOutOfBoundsException: 9 at 
org.apache.solr.spelling.WordBreakSolrSpellChecker.getSuggestions(WordBreakSolrSpellChecker.java:266)
 at 
org.apache.solr.spelling.ConjunctionSolrSpellChecker.getSuggestions(ConjunctionSolrSpellChecker.java:120)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) 
at

[jira] [Updated] (SOLR-5628) Cloud test harness can cause index files to be deleted before IndexWriter is closed

2014-03-06 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5628:
---

Description: 
This bug was orriginally opened because Jenkins uncovered a test seed that 
causes a reproducible IndexWriter assertion failure in 
TestDistribDocBasedVersion on the 4x branch.

McCandless helped dig in and believe that something in the way the solr test 
framework is setup is causing the test to delete the index dirs before the 
IndexWriter is being closed. Meanwhile, the failures later reproduced in other 
seeds on both 4x and trunk -- and it appears that recent changes caused the 
nature of the failure to change, so that now -- in addition to the IndexWriter 
assertion failure -- the test cleanup also stalls out and the test runner has 
to terminate some stalled threads.

One interesting factor about this test is that at the end of the test there 
were docs that had been added that were not committed -- which is probably 
unusually for most tests, and may explain why more cloud tests aren't 
exhibiting similar symptoms more often.

*When a useless (from perspective of what the test is trying to verify) 
commit was added to the test, the failing seed stoped reproducing.*

An example of how to reliably reproduce this problem on an (older version of) 
trunk...

{noformat}
svn update -r 1574381  ant clean  cd solr/core  ant test  
-Dtestcase=TestDistribDocBasedVersion -Dtests.seed=1249227945045A2E 
-Dtests.slow=true -Dtests.locale=ko_KR -Dtests.timezone=America/Monterrey 
-Dtests.file.encoding=ISO-8859-1
{noformat}

Original email thread...

https://mail-archives.apache.org/mod_mbox/lucene-dev/201401.mbox/%3Calpine.DEB.2.02.1401100930260.20275@frisbee%3E


  was:
Jenkins uncovered a test seed that causes a reproducible IndexWriter assertion 
failure in TestDistribDocBasedVersion on the 4x branch.  

McCandless helped dig in and believe that something in the way the solr test 
framework is setup is causing the test to delete the index dirs before the 
IndexWriter is being closed.  Meanwhile, it appears that recent changes to 4x 
have caused the nature of the failure to change, so that now -- in addition to 
the IndexWriter assertion failure -- the test cleanup also stalls out and the 
test runner has to terminate some stalled threads.

details to following in comment, but here's the reproduce line...

{noformat}
ant test  -Dtestcase=TestDistribDocBasedVersion -Dtests.seed=791402573DC76F3C 
-Dtests.multiplier=3 -Dtests.slow=true
-Dtests.locale=ar_IQ -Dtests.timezone=Antarctica/Rothera 
-Dtests.file.encoding=US-ASCII
{noformat}

And the mail thread regarding this...
https://mail-archives.apache.org/mod_mbox/lucene-dev/201401.mbox/%3Calpine.DEB.2.02.1401100930260.20275@frisbee%3E

Summary: Cloud test harness can cause index files to be deleted before 
IndexWriter is closed  (was: Cloud test harness manifesting reproducible 
failures in TestDistribDocBasedVersion)

I've edited the description to reflect the updated state of things, since i've 
been able to commit a work around to the original test that manifested the 
problem with the cloud test framework.

 Cloud test harness can cause index files to be deleted before IndexWriter is 
 closed
 ---

 Key: SOLR-5628
 URL: https://issues.apache.org/jira/browse/SOLR-5628
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 This bug was orriginally opened because Jenkins uncovered a test seed that 
 causes a reproducible IndexWriter assertion failure in 
 TestDistribDocBasedVersion on the 4x branch.
 McCandless helped dig in and believe that something in the way the solr test 
 framework is setup is causing the test to delete the index dirs before the 
 IndexWriter is being closed. Meanwhile, the failures later reproduced in 
 other seeds on both 4x and trunk -- and it appears that recent changes caused 
 the nature of the failure to change, so that now -- in addition to the 
 IndexWriter assertion failure -- the test cleanup also stalls out and the 
 test runner has to terminate some stalled threads.
 One interesting factor about this test is that at the end of the test there 
 were docs that had been added that were not committed -- which is probably 
 unusually for most tests, and may explain why more cloud tests aren't 
 exhibiting similar symptoms more often.
 *When a useless (from perspective of what the test is trying to verify) 
 commit was added to the test, the failing seed stoped reproducing.*
 An example of how to reliably reproduce this problem on an (older version of) 
 trunk...
 {noformat}
 svn update -r 1574381  ant clean  cd solr/core  ant test  
 -Dtestcase=TestDistribDocBasedVersion -Dtests.seed=1249227945045A2E 
 -Dtests.slow=true -Dtests.locale=ko_KR -Dtests.timezone=America/Monterrey

[jira] [Commented] (SOLR-5628) Cloud test harness can cause index files to be deleted before IndexWriter is closed

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922697#comment-13922697
 ] 

ASF subversion and git services commented on SOLR-5628:
---

Commit 1574942 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1574942 ]

SOLR-5628: work arround for this test to avoid whatever bug is in the cloud 
test framework (merge r1574941)

 Cloud test harness can cause index files to be deleted before IndexWriter is 
 closed
 ---

 Key: SOLR-5628
 URL: https://issues.apache.org/jira/browse/SOLR-5628
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 This bug was orriginally opened because Jenkins uncovered a test seed that 
 causes a reproducible IndexWriter assertion failure in 
 TestDistribDocBasedVersion on the 4x branch.
 McCandless helped dig in and believe that something in the way the solr test 
 framework is setup is causing the test to delete the index dirs before the 
 IndexWriter is being closed. Meanwhile, the failures later reproduced in 
 other seeds on both 4x and trunk -- and it appears that recent changes caused 
 the nature of the failure to change, so that now -- in addition to the 
 IndexWriter assertion failure -- the test cleanup also stalls out and the 
 test runner has to terminate some stalled threads.
 One interesting factor about this test is that at the end of the test there 
 were docs that had been added that were not committed -- which is probably 
 unusually for most tests, and may explain why more cloud tests aren't 
 exhibiting similar symptoms more often.
 *When a useless (from perspective of what the test is trying to verify) 
 commit was added to the test, the failing seed stoped reproducing.*
 An example of how to reliably reproduce this problem on an (older version of) 
 trunk...
 {noformat}
 svn update -r 1574381  ant clean  cd solr/core  ant test  
 -Dtestcase=TestDistribDocBasedVersion -Dtests.seed=1249227945045A2E 
 -Dtests.slow=true -Dtests.locale=ko_KR -Dtests.timezone=America/Monterrey 
 -Dtests.file.encoding=ISO-8859-1
 {noformat}
 Original email thread...
 https://mail-archives.apache.org/mod_mbox/lucene-dev/201401.mbox/%3Calpine.DEB.2.02.1401100930260.20275@frisbee%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922701#comment-13922701
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574945 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574945 ]

LUCENE-5493: simplify this test

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922709#comment-13922709
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574949 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574949 ]

LUCENE-5493: merge Sorter and SortSorter (in progress)

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5796) With many collections, leader re-election takes too long when a node dies or is rebooted, leading to some shards getting into a conflicting state about who is the lead

2014-03-06 Thread Timothy Potter (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922711#comment-13922711
 ] 

Timothy Potter commented on SOLR-5796:
--

I'm still a little concern about a couple of things:

1) why is the leader state stored in two places in ZooKeeper 
(/clusterstate.json and /collections/coll/leaders/shard#)? I'm sure there is 
a good reason for this but can't see why ;-)

2) if the timeout still occurs (as we don't want to wait forever), can't the 
node with the conflict just favor what's in the leader path assuming that 
replica is active and agrees? In other words, instead of throwing an exception 
and then just ending up a down state, why can't the replica seeing the 
conflict just go with what ZooKeeper says?

I'm digging into leader failover timing / error handling today.

Thanks. Tim

 With many collections, leader re-election takes too long when a node dies or 
 is rebooted, leading to some shards getting into a conflicting state about 
 who is the leader.
 

 Key: SOLR-5796
 URL: https://issues.apache.org/jira/browse/SOLR-5796
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
 Environment: Found on branch_4x
Reporter: Timothy Potter
Assignee: Mark Miller
 Fix For: 4.8, 5.0

 Attachments: SOLR-5796.patch


 I'm doing some testing with a 4-node SolrCloud cluster against the latest rev 
 in branch_4x having many collections, 150 to be exact, each having 4 shards 
 with rf=3, so 450 cores per node. Nodes are decent in terms of resources: 
 -Xmx6g with 4 CPU - m3.xlarge's in EC2.
 The problem occurs when rebooting one of the nodes, say as part of a rolling 
 restart of the cluster. If I kill one node and then wait for an extended 
 period of time, such as 3 minutes, then all of the leaders on the downed node 
 (roughly 150) have time to failover to another node in the cluster. When I 
 restart the downed node, since leaders have all failed over successfully, the 
 new node starts up and all cores assume the replica role in their respective 
 shards. This is goodness and expected.
 However, if I don't wait long enough for the leader failover process to 
 complete on the other nodes before restarting the downed node, 
 then some bad things happen. Specifically, when the dust settles, many of the 
 previous leaders on the node I restarted get stuck in the conflicting state 
 seen in the ZkController, starting around line 852 in branch_4x:
 {quote}
 852   while (!leaderUrl.equals(clusterStateLeaderUrl)) {
 853 if (tries == 60) {
 854   throw new SolrException(ErrorCode.SERVER_ERROR,
 855   There is conflicting information about the leader of 
 shard: 
 856   + cloudDesc.getShardId() +  our state says:
 857   + clusterStateLeaderUrl +  but zookeeper says: + 
 leaderUrl);
 858 }
 859 Thread.sleep(1000);
 860 tries++;
 861 clusterStateLeaderUrl = zkStateReader.getLeaderUrl(collection, 
 shardId,
 862 timeoutms);
 863 leaderUrl = getLeaderProps(collection, cloudDesc.getShardId(), 
 timeoutms)
 864 .getCoreUrl();
 865   }
 {quote}
 As you can see, the code is trying to give a little time for this problem to 
 work itself out, 1 minute to be exact. Unfortunately, that doesn't seem to be 
 long enough for a busy cluster that has many collections. Now, one might 
 argue that 450 cores per node is asking too much of Solr, however I think 
 this points to a bigger issue of the fact that a node coming up isn't aware 
 that it went down and leader election is running on other nodes and is just 
 being slow. Moreover, once this problem occurs, it's not clear how to fix it 
 besides shutting the node down again and waiting for leader failover to 
 complete.
 It's also interesting to me that /clusterstate.json was updated by the 
 healthy node taking over the leader role but the 
 /collections/collleaders/shard# was not updated? I added some debugging and 
 it seems like the overseer queue is extremely backed up with work.
 Maybe the solution here is to just wait longer but I also want to get some 
 feedback from the community on other options? I know there are some plans to 
 help scale the Overseer (i.e. SOLR-5476) so maybe that helps and I'm trying 
 to add more debug to see if this is really due to overseer backlog (which I 
 suspect it is).
 In general, I'm a little confused by the keeping of leader state in multiple 
 places in ZK. Is there any background information on why we have leader state 
 in /clusterstate.json and in the leader path znode?
 Also, here are some interesting side

[jira] [Commented] (SOLR-3854) SolrCloud does not work with https

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922717#comment-13922717
 ] 

ASF subversion and git services commented on SOLR-3854:
---

Commit 1574951 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1574951 ]

SOLR-3854: IntelliJ config: add solr example lib test dependency to map-reduce 
and dataimporthandler contribs

 SolrCloud does not work with https
 --

 Key: SOLR-3854
 URL: https://issues.apache.org/jira/browse/SOLR-3854
 Project: Solr
  Issue Type: Bug
Reporter: Sami Siren
Assignee: Mark Miller
 Fix For: 4.7, 5.0

 Attachments: SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, 
 SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, 
 SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, 
 SOLR-3854v2.patch, SOLR-3854v3.patch, SOLR-3854v4.patch


 There are a few places in current codebase that assume http is used. This 
 prevents using https when running solr in cloud mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3854) SolrCloud does not work with https

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922718#comment-13922718
 ] 

ASF subversion and git services commented on SOLR-3854:
---

Commit 1574953 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1574953 ]

SOLR-3854: IntelliJ config: add solr example lib test dependency to map-reduce 
and dataimporthandler contribs (merged trunk r1574951)

 SolrCloud does not work with https
 --

 Key: SOLR-3854
 URL: https://issues.apache.org/jira/browse/SOLR-3854
 Project: Solr
  Issue Type: Bug
Reporter: Sami Siren
Assignee: Mark Miller
 Fix For: 4.7, 5.0

 Attachments: SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, 
 SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, 
 SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, SOLR-3854.patch, 
 SOLR-3854v2.patch, SOLR-3854v3.patch, SOLR-3854v4.patch


 There are a few places in current codebase that assume http is used. This 
 prevents using https when running solr in cloud mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922723#comment-13922723
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574954 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574954 ]

LUCENE-5493: javadocs

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922746#comment-13922746
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574962 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574962 ]

LUCENE-5493: fix precommit

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922757#comment-13922757
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574965 from [~mikemccand] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574965 ]

LUCENE-5493: don't do forceMerge on initital build of AnalyzingInfixSuggester

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented

2014-03-06 Thread John Wang (JIRA)

John Wang created LUCENE-5495:
-

 Summary: Boolean Filter does not handle FilterClauses with only 
bits() implemented
 Key: LUCENE-5495
 URL: https://issues.apache.org/jira/browse/LUCENE-5495
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.6.1
Reporter: John Wang


Some Filter implementations produce DocIdSets without the iterator() 
implementation, such as o.a.l.facet.range.Range.getFilter().

Currently, such filters cannot be added to a BooleanFilter because 
BooleanFilter expects all FilterClauses with Filters that have iterator() 
implemented.

This patch improves the behavior by taking Filters with bits() implemented and 
treat them separately.

This behavior would be faster in the case for Filters with a forward index as 
the underlying data structure, where there would be no need to scan the index 
to build an iterator.

See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented

2014-03-06 Thread John Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wang updated LUCENE-5495:
--

Attachment: LUCENE-5495.patch

 Boolean Filter does not handle FilterClauses with only bits() implemented
 -

 Key: LUCENE-5495
 URL: https://issues.apache.org/jira/browse/LUCENE-5495
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.6.1
Reporter: John Wang
 Attachments: LUCENE-5495.patch


 Some Filter implementations produce DocIdSets without the iterator() 
 implementation, such as o.a.l.facet.range.Range.getFilter().
 Currently, such filters cannot be added to a BooleanFilter because 
 BooleanFilter expects all FilterClauses with Filters that have iterator() 
 implemented.
 This patch improves the behavior by taking Filters with bits() implemented 
 and treat them separately.
 This behavior would be faster in the case for Filters with a forward index as 
 the underlying data structure, where there would be no need to scan the index 
 to build an iterator.
 See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922772#comment-13922772
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574969 from [~mikemccand] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574969 ]

LUCENE-5493: fix solr

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5470) Refactoring multiterm analysis

2014-03-06 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated LUCENE-5470:


Attachment: LUCENE-5470_QPBase.patch
LUCENE-5470_QBuilder.patch

Two patches.  One consolidates analysis in QueryBuilder and one consolidates it 
in QueryParserBase.

I've added a check for position increment.  I don't think this will create 
false exceptions, but let me know if anyone thinks otherwise.

If we go with QueryBuilder, I'm not held to static.

 Refactoring multiterm analysis
 --

 Key: LUCENE-5470
 URL: https://issues.apache.org/jira/browse/LUCENE-5470
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 5.0
Reporter: Tim Allison
Priority: Minor
 Attachments: LUCENE-5470.patch, LUCENE-5470_QBuilder.patch, 
 LUCENE-5470_QPBase.patch


 There are currently three methods to analyze multiterms in Lucene and Solr:
 1) QueryParserBase
 2) AnalyzingQueryParser
 3) TextField (Solr)
 The code in QueryParserBase and in TextField do not consume the tokenstream 
 if more than one token is generated by the analyzer.  (Admittedly, thanks to 
 the magic of MultitermAwareComponents in Solr, this type of exception 
 probably never happens and the unconsumed stream problem is probably 
 non-existent in Solr.)
 I propose consolidating the multiterm analysis code into one place: 
 QueryBuilder in Lucene core.
 This is part of a refactoring that will also help reduce duplication of code 
 with LUCENE-5205.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922783#comment-13922783
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1574972 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1574972 ]

LUCENE-5493: add CHANGES

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch, LUCENE-5493.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


 [ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5493:


Attachment: LUCENE-5493.patch

Here is a patch for review. The public API is much simpler and I think it makes 
the SortingMP a lot more flexible and easier to use.

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch, LUCENE-5493.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922786#comment-13922786
 ] 

Tim Allison commented on LUCENE-5205:
-

[~rcmuir], if you have a chance to review and commit the Feb 28 patch for 
cleaning up the test cases, I'd greatly appreciate it!  Thank you, again.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.7

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_smallTestMods.patch, 
 LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5768) Add a distrib.singlePass parameter to make EXECUTE_QUERY phase fetch all fields and skip GET_FIELDS

2014-03-06 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5768:


Summary: Add a distrib.singlePass parameter to make EXECUTE_QUERY phase 
fetch all fields and skip GET_FIELDS  (was: Add a distrib.singlePass parameter 
to make GET_FIELDS phase fetch all fields and skip EXECUTE_QUERY)

 Add a distrib.singlePass parameter to make EXECUTE_QUERY phase fetch all 
 fields and skip GET_FIELDS
 ---

 Key: SOLR-5768
 URL: https://issues.apache.org/jira/browse/SOLR-5768
 Project: Solr
  Issue Type: Improvement
Reporter: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.8, 5.0


 Suggested by Yonik on solr-user:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg95045.html
 {quote}
 Although it seems like it should be relatively simple to make it work
 with other fields as well, by passing down the complete fl requested
 if some optional parameter is set (distrib.singlePass?)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5768) Add a distrib.singlePass parameter to make GET_FIELDS phase fetch all fields and skip EXECUTE_QUERY

2014-03-06 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922796#comment-13922796
 ] 

Shalin Shekhar Mangar commented on SOLR-5768:
-

see it is already implemented ;)

thanks, I'll fix.

 Add a distrib.singlePass parameter to make GET_FIELDS phase fetch all fields 
 and skip EXECUTE_QUERY
 ---

 Key: SOLR-5768
 URL: https://issues.apache.org/jira/browse/SOLR-5768
 Project: Solr
  Issue Type: Improvement
Reporter: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.8, 5.0


 Suggested by Yonik on solr-user:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg95045.html
 {quote}
 Although it seems like it should be relatively simple to make it work
 with other fields as well, by passing down the complete fl requested
 if some optional parameter is set (distrib.singlePass?)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

JDK 8 : Third Release Candidate - Build 132 is available on java.net

2014-03-06 Thread Rory O'Donnell Oracle, Dublin Ireland


Hi Uwe,Dawid,

JDK 8 Third Release Candidate , Build 132 is now available for download 
http://jdk8.java.net/download.html  test.

Please log all show stopper issues as soon as possible.

Thanks for your support, Rory

--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922811#comment-13922811
 ] 

Michael McCandless commented on LUCENE-5493:


+1, looks great.

This also makes it trivial to do impact-sorted postings by an arbitrary 
expression.

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch, LUCENE-5493.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5821) Search inconsistency on SolrCloud replicas

2014-03-06 Thread Maxim Novikov (JIRA)

Maxim Novikov created SOLR-5821:
---

 Summary: Search inconsistency on SolrCloud replicas
 Key: SOLR-5821
 URL: https://issues.apache.org/jira/browse/SOLR-5821
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6.1
 Environment: CentOS 6.5, Tomcat 8.0.3, Solr 4.6.1
1 shard, 2 replicas (servers with identical hardware/software)
Reporter: Maxim Novikov
Priority: Critical



We use the following infrastructure:

SolrCloud with 1 shard and 2 replicas. The index is built using 
DataImportHandler (importing data from the database). The number of items in 
the index can vary from 100 to 100,000,000.

After indexing part of the data (not necessarily all the data, it is enough to 
have a small number of items in the search index), we can observe that Solr 
instances (replicas) return different results for the same search queries. I 
believe it happens because some of the results have the same scores, and Solr 
instances return those in a random order.

PS This is a critical issue for us as we use a load balancer to scale Solr 
through replicas, and as a result of this issue, we retrieve various results 
for the same queries all the time. They are not necessarily completely 
different, but even a couple of items that differ is a deal breaker.

The expected behaviour would be to always get identical results for the same 
search queries from all replicas. Otherwise, this cloud thing works just 
unreliably.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5492) IndexFileDeleter AssertionError in presence of *_upgraded.si files

2014-03-06 Thread Tim Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922847#comment-13922847
 ] 

Tim Smith commented on LUCENE-5492:
---

that seems to be the culprit

in my IndexWriter subclass, i overrode incRefDeleter and decRefDeleter to be 
no-ops and it no longer fails horribly

hopefully this doesn't have any negative effects (looks like that was all that 
was in the patch on LUCENE-5434, so worst case scenario i just don't get to 
take advantage of the benefits there)



 IndexFileDeleter AssertionError in presence of *_upgraded.si files
 --

 Key: LUCENE-5492
 URL: https://issues.apache.org/jira/browse/LUCENE-5492
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Tim Smith

 When calling IndexWriter.deleteUnusedFiles against an index that contains 3.x 
 segments, i am seeing the following exception:
 {code}
 java.lang.AssertionError: failAndDumpStackJunitStatment: RefCount is 0 
 pre-decrement for file _0_upgraded.si
 at 
 org.apache.lucene.index.IndexFileDeleter$RefCount.DecRef(IndexFileDeleter.java:630)
 at 
 org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:514)
 at 
 org.apache.lucene.index.IndexFileDeleter.deleteCommits(IndexFileDeleter.java:286)
 at 
 org.apache.lucene.index.IndexFileDeleter.revisitPolicy(IndexFileDeleter.java:393)
 at 
 org.apache.lucene.index.IndexWriter.deleteUnusedFiles(IndexWriter.java:4617)
 {code}
 I believe this is caused by IndexFileDeleter not being aware of the Lucene3x 
 Segment Infos Format (notably the _upgraded.si files created to upgrade an 
 old index)
 This is new in 4.7 and did not occur in 4.6.1
 Still trying to track down a workaround/fix



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5821) Search inconsistency on SolrCloud replicas

2014-03-06 Thread Maxim Novikov (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Novikov updated SOLR-5821:


Environment: 
CentOS 6.5, 8Gb RAM, 4 CPUs, 100Gb HDD
Tomcat 8.0.3, Solr 4.6.1

1 shard, 2 replicas (servers with identical hardware/software)

  was:
CentOS 6.5, Tomcat 8.0.3, Solr 4.6.1
1 shard, 2 replicas (servers with identical hardware/software)


 Search inconsistency on SolrCloud replicas
 --

 Key: SOLR-5821
 URL: https://issues.apache.org/jira/browse/SOLR-5821
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6.1
 Environment: CentOS 6.5, 8Gb RAM, 4 CPUs, 100Gb HDD
 Tomcat 8.0.3, Solr 4.6.1
 1 shard, 2 replicas (servers with identical hardware/software)
Reporter: Maxim Novikov
Priority: Critical
  Labels: cloud, inconsistency, replica, search

 We use the following infrastructure:
 SolrCloud with 1 shard and 2 replicas. The index is built using 
 DataImportHandler (importing data from the database). The number of items in 
 the index can vary from 100 to 100,000,000.
 After indexing part of the data (not necessarily all the data, it is enough 
 to have a small number of items in the search index), we can observe that 
 Solr instances (replicas) return different results for the same search 
 queries. I believe it happens because some of the results have the same 
 scores, and Solr instances return those in a random order.
 PS This is a critical issue for us as we use a load balancer to scale Solr 
 through replicas, and as a result of this issue, we retrieve various results 
 for the same queries all the time. They are not necessarily completely 
 different, but even a couple of items that differ is a deal breaker.
 The expected behaviour would be to always get identical results for the same 
 search queries from all replicas. Otherwise, this cloud thing works just 
 unreliably.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5821) Search inconsistency on SolrCloud replicas

2014-03-06 Thread Maxim Novikov (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Novikov updated SOLR-5821:


Environment: 
SolrCloud:
1 shard, 2 replicas

Both instances/replicas have identical hardware/software:
CPU(s): 4
RAM: 8Gb
HDD: 100Gb
OS: CentOS 6.5
ZooKeeper 3.4.5
Tomcat 8.0.3
Solr 4.6.1

Servers are utilized to run Solr only.

  was:
CentOS 6.5, 8Gb RAM, 4 CPUs, 100Gb HDD
Tomcat 8.0.3, Solr 4.6.1

1 shard, 2 replicas (servers with identical hardware/software)


 Search inconsistency on SolrCloud replicas
 --

 Key: SOLR-5821
 URL: https://issues.apache.org/jira/browse/SOLR-5821
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6.1
 Environment: SolrCloud:
 1 shard, 2 replicas
 Both instances/replicas have identical hardware/software:
 CPU(s): 4
 RAM: 8Gb
 HDD: 100Gb
 OS: CentOS 6.5
 ZooKeeper 3.4.5
 Tomcat 8.0.3
 Solr 4.6.1
 Servers are utilized to run Solr only.
Reporter: Maxim Novikov
Priority: Critical
  Labels: cloud, inconsistency, replica, search

 We use the following infrastructure:
 SolrCloud with 1 shard and 2 replicas. The index is built using 
 DataImportHandler (importing data from the database). The number of items in 
 the index can vary from 100 to 100,000,000.
 After indexing part of the data (not necessarily all the data, it is enough 
 to have a small number of items in the search index), we can observe that 
 Solr instances (replicas) return different results for the same search 
 queries. I believe it happens because some of the results have the same 
 scores, and Solr instances return those in a random order.
 PS This is a critical issue for us as we use a load balancer to scale Solr 
 through replicas, and as a result of this issue, we retrieve various results 
 for the same queries all the time. They are not necessarily completely 
 different, but even a couple of items that differ is a deal breaker.
 The expected behaviour would be to always get identical results for the same 
 search queries from all replicas. Otherwise, this cloud thing works just 
 unreliably.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5476) Facet sampling

2014-03-06 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922884#comment-13922884
]

Shai Erera commented on LUCENE-5476:

bq. but any facet accumulation which would rely on document scores would be hit
by the second as the scores

That's a great point Gilad. We need a test which covers that with random
sampling collector.

bq. Is there a reason to add more randomness to one test?

It depends. I have a problem with numDocs=10,000 and percents being 10% .. it
creates too perfect numbers if you know what I mean. I prefer a random number
of documents to add some spice to the test. Since we're testing a random
sampler, I don't think it makes sense to test it with a fixed seed (0xdeadbeef)
... this collector is all about randomness, so we should stress the randomness
done there. Given our test framework, randomness is not a big deal at all,
since once we get a test failure, we can deterministically reproduce the
failure (when there is no multi-threading). So I say YES, in this test I think
we should have randomness.

But e.g. when you add a test which ensures the collector works well w/ sampled
docs and scores, I don't think you should add randomness -- it's ok to test it
once.

Also, in terms of test coverage, there are other cases which I think would be
good if they were tested:

* Docs + Scores (discussed above)
* Multi-segment indexes (ensuring we work well there)
* Different number of hits per-segment (to make sure our sampling on tiny
segments works well too)
* ...

I wouldn't for example use RandomIndexWriter because we're only testing search.
If we want many segments, we should commit/nrt-open every few segments, disable
merge policy etc. These can be separate, real unit, tests.

bq. Sorry, I don't get what you mean by this.

I meant that if you set {{numDocs = atLeast(8000)}}, then the 10% sampler
should not be hardcoded to 1,000, but {{numDocs * 0.1}}.

bq. the original totalHits .. is used

I think that's OK. In fact, if we don't record that, it would be hard to fix
the counts no?

{quote}
There will be 5 facet values (0, 2, 4, 6 and 8), as only the even documents (i
% 10) are hits. There is a REAL small chance that one of the five values will
be entirely missed when sampling. But is that 0.8 (chance not to take a value)
^ 2000 * 5 (any can be missing) ~ 10^-193, so that is probable not going to
happen
{quote}

Ahh thanks, I missed that. I agree it's very improbable that one of the values
is missing, but if we can avoid that at all it's better. First, it's not one of
the values, we could be missing even 2 right -- really depends on randomness. I
find this assert just redundant -- if we always expect 5, we shouldn't assert
that we received 5. If we say that very infrequently we might get 5 and we're
OK with it .. what's the point of asserting that at all?

bq. I renamed the sampleThreshold to sampleSize. It currently picks a
samplingRatio that will reduce the number of hits to the sampleSize, if the
number of hits is greater.

It looks like it hasn't changed? I mean besides the rename. So if I set
sampleSize=100K, it's 100K whether there are 101K docs or 100M docs, right? Is
that your intention?

Facet sampling
--

Key: LUCENE-5476
URL: https://issues.apache.org/jira/browse/LUCENE-5476
Project: Lucene - Core
Issue Type: Improvement
Reporter: Rob Audenaerde
Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5476) Facet sampling

2014-03-06 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922884#comment-13922884
]

Shai Erera edited comment on LUCENE-5476 at 3/6/14 6:56 PM:

bq. but any facet accumulation which would rely on document scores would be hit
by the second as the scores

That's a great point Gilad. We need a test which covers that with random
sampling collector.

bq. Is there a reason to add more randomness to one test?

But e.g. when you add a test which ensures the collector works well w/ sampled
docs and scores, I don't think you should add randomness -- it's ok to test it
once.

Also, in terms of test coverage, there are other cases which I think would be
good if they were tested:

* Docs + Scores (discussed above)
* Multi-segment indexes (ensuring we work well there)
* Different number of hits per-segment (to make sure our sampling on tiny
segments works well too)
* ...

I wouldn't for example use RandomIndexWriter because we're only testing search
(and so it just adds noise in this test). If we want many segments, we should
commit/nrt-open every few docs, disable merge policy etc. These can be
separate, real unit-, tests.

bq. Sorry, I don't get what you mean by this.

I meant that if you set {{numDocs = atLeast(8000)}}, then the 10% sampler
should not be hardcoded to 1,000, but {{numDocs * 0.1}}.

bq. the original totalHits .. is used

I think that's OK. In fact, if we don't record that, it would be hard to fix
the counts no?

bq. I renamed the sampleThreshold to sampleSize. It currently picks a
samplingRatio that will reduce the number of hits to the sampleSize, if the
number of hits is greater.

It looks like it hasn't changed? I mean besides the rename. So if I set
sampleSize=100K, it's 100K whether there are 101K docs or 100M docs, right? Is
that your intention?

was (Author: shaie):
bq. but any facet accumulation which would rely on document scores would be hit
by the second as the scores

That's a great point Gilad. We need a test which covers that with random
sampling collector.

bq. Is there a reason to add more randomness to one test?

But e.g. when you add a test which ensures the collector works well w/ sampled
docs and scores, I don't think you should add randomness -- it's ok to test it
once.

Also, in terms of test coverage, there are other cases which I think would be
good if they were tested:

* Docs + Scores (discussed above)
* Multi-segment indexes (ensuring we work well there)
* Different number of hits per-segment (to make sure our sampling on tiny
segments works well too)
* ...

I wouldn't for example use RandomIndexWriter because we're only testing search.
If we want many segments, we should commit/nrt-open every few segments, disable
merge policy etc. These can

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922896#comment-13922896
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1575008 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1575008 ]

LUCENE-5493: javadocs cleanups

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch, LUCENE-5493.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5493) Rename Sorter, NumericDocValuesSorter, and fix javadocs

2014-03-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922918#comment-13922918
 ] 

ASF subversion and git services commented on LUCENE-5493:
-

Commit 1575017 from [~rcmuir] in branch 'dev/branches/lucene5493'
[ https://svn.apache.org/r1575017 ]

LUCENE-5493: add missing experimental tag

 Rename Sorter, NumericDocValuesSorter, and fix javadocs
 ---

 Key: LUCENE-5493
 URL: https://issues.apache.org/jira/browse/LUCENE-5493
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5493-poc.patch, LUCENE-5493.patch


 Its not clear to users that these are for this super-expert thing of 
 pre-sorting the index. From the names and documentation they think they 
 should use them instead of Sort/SortField.
 These need to be renamed or, even better, the API fixed so they aren't public 
 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-SmokeRelease-trunk - Build # 163 - Failure

2014-03-06 Thread Mark Miller

On Mar 6, 2014, at 1:24 PM, Michael McCandless luc...@mikemccandless.com 
wrote:

 Should we stop running solr tests in the smoke tester?

I think the current best bet if people insist on running Solr tests in the 
smoke tester is to do it with -Dtests.slow=false.


- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented

2014-03-06 Thread Lei Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922940#comment-13922940
 ] 

Lei Wang commented on LUCENE-5495:
--

+  public BitsDocIdSet(Bits bits, int length) {
+this.bits = bits;
+this.length = length;
+  }

We can assert bits is not a DocIdSet here. if it is, this adds overhead only.


+final ListBits mustBitsList = new ArrayListBits();
+final ListBits mustNotBitsList = new ArrayListBits();

May need a SHOULD list also?


+if (bits != null) {
+  mustNotBitsList.add(bits);
+}

if bits is already a FixedBitSet or OpenBitSet, merge them into res might be 
faster? same for other lists



 Boolean Filter does not handle FilterClauses with only bits() implemented
 -

 Key: LUCENE-5495
 URL: https://issues.apache.org/jira/browse/LUCENE-5495
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.6.1
Reporter: John Wang
 Attachments: LUCENE-5495.patch


 Some Filter implementations produce DocIdSets without the iterator() 
 implementation, such as o.a.l.facet.range.Range.getFilter().
 Currently, such filters cannot be added to a BooleanFilter because 
 BooleanFilter expects all FilterClauses with Filters that have iterator() 
 implemented.
 This patch improves the behavior by taking Filters with bits() implemented 
 and treat them separately.
 This behavior would be faster in the case for Filters with a forward index as 
 the underlying data structure, where there would be no need to scan the index 
 to build an iterator.
 See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented

2014-03-06 Thread Lei Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922940#comment-13922940
 ] 

Lei Wang edited comment on LUCENE-5495 at 3/6/14 7:35 PM:
--

 {noformat}
+  public BitsDocIdSet(Bits bits, int length) {
+this.bits = bits;
+this.length = length;
+  }
 {noformat}

We can assert bits is not a DocIdSet here. if it is, this adds overhead only.

 {noformat}
+final ListBits mustBitsList = new ArrayListBits();
+final ListBits mustNotBitsList = new ArrayListBits();
 {noformat}

May need a SHOULD list also?

 {noformat}
+if (bits != null) {
+  mustNotBitsList.add(bits);
+}
 {noformat}

if bits is already a FixedBitSet or OpenBitSet, merge them into res might be 
faster? same for other lists




was (Author: wonlay):
+  public BitsDocIdSet(Bits bits, int length) {
+this.bits = bits;
+this.length = length;
+  }

We can assert bits is not a DocIdSet here. if it is, this adds overhead only.


+final ListBits mustBitsList = new ArrayListBits();
+final ListBits mustNotBitsList = new ArrayListBits();

May need a SHOULD list also?


+if (bits != null) {
+  mustNotBitsList.add(bits);
+}

if bits is already a FixedBitSet or OpenBitSet, merge them into res might be 
faster? same for other lists



 Boolean Filter does not handle FilterClauses with only bits() implemented
 -

 Key: LUCENE-5495
 URL: https://issues.apache.org/jira/browse/LUCENE-5495
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.6.1
Reporter: John Wang
 Attachments: LUCENE-5495.patch


 Some Filter implementations produce DocIdSets without the iterator() 
 implementation, such as o.a.l.facet.range.Range.getFilter().
 Currently, such filters cannot be added to a BooleanFilter because 
 BooleanFilter expects all FilterClauses with Filters that have iterator() 
 implemented.
 This patch improves the behavior by taking Filters with bits() implemented 
 and treat them separately.
 This behavior would be faster in the case for Filters with a forward index as 
 the underlying data structure, where there would be no need to scan the index 
 to build an iterator.
 See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-SmokeRelease-trunk - Build # 163 - Failure

2014-03-06 Thread Robert Muir

On Thu, Mar 6, 2014 at 2:33 PM, Mark Miller markrmil...@gmail.com wrote:
 On Mar 6, 2014, at 1:24 PM, Michael McCandless luc...@mikemccandless.com 
 wrote:

 Should we stop running solr tests in the smoke tester?

 I think the current best bet if people insist on running Solr tests in the 
 smoke tester is to do it with -Dtests.slow=false.


I was the one that added it. I don't insist on running them in the
smokeTester, i just felt like it was the right thing to do.

Do you think we should turn them off?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5596) OverseerTest.testOverseerFailure - leader node already exists.


[ 
https://issues.apache.org/jira/browse/SOLR-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922947#comment-13922947
 ] 

Mark Miller commented on SOLR-5596:
---

So we still hit this - pretty surprising. I've gone over the test a couple 
times and have not spotted the problem yet, but I think it must be an issue 
with the test.

 OverseerTest.testOverseerFailure - leader node already exists.
 --

 Key: SOLR-5596
 URL: https://issues.apache.org/jira/browse/SOLR-5596
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0


 Seeing this a bunch on jenkins - previous leader ephemeral node is still 
 around for some reason.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-SmokeRelease-trunk - Build # 163 - Failure

2014-03-06 Thread Mark Miller



On Mar 6, 2014, at 2:36 PM, Robert Muir rcm...@gmail.com wrote:

 On Thu, Mar 6, 2014 at 2:33 PM, Mark Miller markrmil...@gmail.com wrote:
 On Mar 6, 2014, at 1:24 PM, Michael McCandless luc...@mikemccandless.com 
 wrote:
 
 Should we stop running solr tests in the smoke tester?
 
 I think the current best bet if people insist on running Solr tests in the 
 smoke tester is to do it with -Dtests.slow=false.
 
 
 I was the one that added it. I don't insist on running them in the
 smokeTester, i just felt like it was the right thing to do.
 
 Do you think we should turn them off?

Like I said, I think if we want to run them currently, we should do it with 
-Dtests.slow=false.

I do think it would be nice to be able to run them all, but I think step one is 
probably going from no tests to -Dtests.slow=false.

With a little effort from other Solr devs, we are not too far off from being 
able to do the whole suit at this point - there has been a bunch of progress 
from a variety of sources over the past few weeks.


- Mark

http://about.me/markrmiller
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-SmokeRelease-trunk - Build # 163 - Failure

2014-03-06 Thread Robert Muir

I think its a good step. Mainly i hate not running them at all like
before. At least this way we are testing, even if some are disabled.

I'll fix it.

On Thu, Mar 6, 2014 at 2:45 PM, Mark Miller markrmil...@gmail.com wrote:


 On Mar 6, 2014, at 2:36 PM, Robert Muir rcm...@gmail.com wrote:

 On Thu, Mar 6, 2014 at 2:33 PM, Mark Miller markrmil...@gmail.com wrote:
 On Mar 6, 2014, at 1:24 PM, Michael McCandless luc...@mikemccandless.com 
 wrote:

 Should we stop running solr tests in the smoke tester?

 I think the current best bet if people insist on running Solr tests in the 
 smoke tester is to do it with -Dtests.slow=false.


 I was the one that added it. I don't insist on running them in the
 smokeTester, i just felt like it was the right thing to do.

 Do you think we should turn them off?

 Like I said, I think if we want to run them currently, we should do it with 
 -Dtests.slow=false.

 I do think it would be nice to be able to run them all, but I think step one 
 is probably going from no tests to -Dtests.slow=false.

 With a little effort from other Solr devs, we are not too far off from being 
 able to do the whole suit at this point - there has been a bunch of progress 
 from a variety of sources over the past few weeks.


 - Mark

 http://about.me/markrmiller
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Dataimport handler

2014-03-06 Thread Pritesh Patel

I'm using the dataimporthandler to index data from a mysql DB.  Been
running it just fine. I've been using full-imports. I'm now trying
implement the delta import functionality.

To implement the delta query, you need to be reading the last_index_time
from a properties file to know what new to index.  So I'm using the
parameter:
{dataimporter.last_index_time} within my query.

The problem is when I use this, the date always is : Thu Jan 01 00:00:00
UTC 1970.  It's never actually reading the correct date stored in the
dataimport.properties file.

So my delta query does not work.  Has anybody see this issue?

Seems like its always using the beginning date for epoch or unix timestamp
code 0.

--Pritesh

P.S.  If you want to see the delta query, see below.

deltaQuery=SELECT node.nid from node where node.type = 'news' and
node.status = 1 and (node.changed gt;
UNIX_TIMESTAMP('${dataimporter.last_index_time}'jgkg) or node.created gt;
UNIX_TIMESTAMP('${dataimporter.last_index_time}'))

deltaImportQuery=SELECT node.nid, node.vid, node.type, node.language,
node.title, node.uid, node.status,
FROM_UNIXTIME(node.created,'%Y-%m-%dT%TZ') as created,
FROM_UNIXTIME(node.changed,'%Y-%m-%dT%TZ') as changed, node.comment,
node.promote, node.moderate, node.sticky, node.tnid, node.translate,
content_type_news.field_image_credit_value,
content_type_news.field_image_caption_value,
content_type_news.field_subhead_value,
content_type_news.field_author_value,
content_type_news.field_dateline_value,
content_type_news.field_article_image_fid,
content_type_news.field_article_image_list,
content_type_news.field_article_image_data,
content_type_news.field_news_blurb_value,
content_type_news.field_news_blurb_format,
content_type_news.field_news_syndicate_value,
content_type_news.field_news_video_reference_nid,
content_type_news.field_news_inline_location_value,
content_type_news.field_article_contributor_nid,
content_type_news.field_news_title_value, page_title.page_title FROM node
LEFT JOIN content_type_news ON node.nid = content_type_news.nid LEFT JOIN
page_title ON node.nid = page_title.id where node.type = 'news' and
node.status = 1 and node.nid = '${deltaimport.delta.nid}'

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922974#comment-13922974
 ] 

Robert Muir commented on LUCENE-5205:
-

Sorry Tim! I'll try to get to this today.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.7

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_smallTestMods.patch, 
 LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5496) Nuke fuzzyMinSim and replace with maxEdits for FuzzyQuery and its friends

Tim Allison created LUCENE-5496:
---

 Summary: Nuke fuzzyMinSim and replace with maxEdits for FuzzyQuery 
and its friends
 Key: LUCENE-5496
 URL: https://issues.apache.org/jira/browse/LUCENE-5496
 Project: Lucene - Core
  Issue Type: Task
  Components: core/queryparser, core/search
Affects Versions: 4.8, 5.0
Reporter: Tim Allison
Priority: Minor


As we get closer to 5.0, I propose adding some deprecations in the queryparsers 
realm of 4.x.

Are we ready to get rid of all fuzzyMinSims in trunk?  




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922978#comment-13922978
 ] 

Tim Allison commented on LUCENE-5205:
-

You've had far bigger fish to fry...np at all!

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.7

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_smallTestMods.patch, 
 LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5496) Nuke fuzzyMinSim and replace with maxEdits for FuzzyQuery and its friends


 [ 
https://issues.apache.org/jira/browse/LUCENE-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated LUCENE-5496:


Attachment: LUCENE-5496_4x_deprecations.patch

Deprecations for 4x.

This doesn't touch:
SlowFuzzyQuery -- deprecated anyways
FuzzyTermsEnum -- on the theory that if you're extending this, you know what's 
coming
EDismax in Solr -- ditto 

I'm not even sure we need these added deprecations in 4x, but I attach this if 
the community would like to add them.

 Nuke fuzzyMinSim and replace with maxEdits for FuzzyQuery and its friends
 -

 Key: LUCENE-5496
 URL: https://issues.apache.org/jira/browse/LUCENE-5496
 Project: Lucene - Core
  Issue Type: Task
  Components: core/queryparser, core/search
Affects Versions: 4.8, 5.0
Reporter: Tim Allison
Priority: Minor
 Attachments: LUCENE-5496_4x_deprecations.patch


 As we get closer to 5.0, I propose adding some deprecations in the 
 queryparsers realm of 4.x.
 Are we ready to get rid of all fuzzyMinSims in trunk?  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-03-06 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922994#comment-13922994
 ] 

Anshum Gupta commented on SOLR-5477:


[~markrmil...@gmail.com] I haven't added anything for SolrJ so for now, it 
doesn't really support async calls. I am assuming that by collection API SolrJ 
calls you mean methods like CollectionAdminRequest.createCollection().

Also, I'm working on adding some stress tests i.e. something that fires 
multiple async requests.

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, 
 SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5822) ChaosMonkeyNothingIsSafeTest.testDistribSearch fail, shard inconsistency, off by 1


[ 
https://issues.apache.org/jira/browse/SOLR-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923001#comment-13923001
 ] 

Mark Miller commented on SOLR-5822:
---

{noformat}
 [exec][junit4]   2 1114710 T4403 C687 P55523 
oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update 
params={wt=javabinversion=2CONTROL=TRUE} {add=[0-672 (1461849649443766272)]} 
0 1
 [exec][junit4]   2 1114718 T4530 C692 P29636 
oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update 
params={wt=javabinversion=2distrib.from=https://127.0.0.1:10132/collection1/update.distrib=FROMLEADER}
 {add=[0-672 (1461849649447960576)]} 0 1
 [exec][junit4]   2 1114722 T4838 C693 P17000 
oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update 
params={wt=javabinversion=2distrib.from=https://127.0.0.1:10132/collection1/update.distrib=FROMLEADER}
 {add=[0-672 (1461849649447960576)]} 0 5
 [exec][junit4]   2 1114723 T4483 C686 P10132 
oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update 
params={wt=javabinversion=2} {add=[0-672 (1461849649447960576)]} 0 9
 [exec][junit4]   2 1117236 T4403 C694 P55523 
oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update 
params={wt=javabinversion=2CONTROL=TRUE} {delete=[0-672 
(-1461849652091420672)]} 0 1
 [exec][junit4]   2 1117242 T4530 C695 P29636 
oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update 
params={wt=javabinversion=2distrib.from=https://127.0.0.1:10132/collection1/update.distrib=FROMLEADER}
 {delete=[0-672 (-1461849652095614976)]} 0 0
 [exec][junit4]   2 1123567 T4867 C702 P17000 
oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update 
params={wt=javabinversion=2distrib.from=https://127.0.0.1:10132/collection1/update.distrib=FROMLEADER}
 {delete=[0-672 (-1461849652095614976)]} 0 1
 [exec][junit4]   2 1123568 T4483 C703 P10132 
oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update 
params={wt=javabinversion=2} {delete=[0-672 (-1461849652095614976)]} 0 6329
 [exec][junit4]   2 ## Only in 
https://127.0.0.1:17000/collection1: [{id=0-672, _version_=1461849649447960576}]

{noformat}

 ChaosMonkeyNothingIsSafeTest.testDistribSearch fail, shard inconsistency, off 
 by 1
 --

 Key: SOLR-5822
 URL: https://issues.apache.org/jira/browse/SOLR-5822
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0


  ChaosMonkeyNothingIsSafeTest.testDistribSearch 
  [exec][junit4] Throwable #1: java.lang.AssertionError: shard2 
 is not consistent.  Got 300 from 
 https://127.0.0.1:17000/collection1lastClient and got 299 from 
 https://127.0.0.1:10132/collection1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5822) ChaosMonkeyNothingIsSafeTest.testDistribSearch fail, shard inconsistency, off by 1

Mark Miller created SOLR-5822:
-

 Summary: ChaosMonkeyNothingIsSafeTest.testDistribSearch fail, 
shard inconsistency, off by 1
 Key: SOLR-5822
 URL: https://issues.apache.org/jira/browse/SOLR-5822
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0


 ChaosMonkeyNothingIsSafeTest.testDistribSearch 
 [exec][junit4] Throwable #1: java.lang.AssertionError: shard2 is 
not consistent.  Got 300 from https://127.0.0.1:17000/collection1lastClient and 
got 299 from https://127.0.0.1:10132/collection1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5822) ChaosMonkeyNothingIsSafeTest.testDistribSearch fail, shard inconsistency, off by 1


[ 
https://issues.apache.org/jira/browse/SOLR-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923004#comment-13923004
 ] 

Mark Miller commented on SOLR-5822:
---

Looks like it's finding 0-672 on P17000, even though it looks like the delete 
of 0-672 on P17000 was received fine.

 ChaosMonkeyNothingIsSafeTest.testDistribSearch fail, shard inconsistency, off 
 by 1
 --

 Key: SOLR-5822
 URL: https://issues.apache.org/jira/browse/SOLR-5822
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0


  ChaosMonkeyNothingIsSafeTest.testDistribSearch 
  [exec][junit4] Throwable #1: java.lang.AssertionError: shard2 
 is not consistent.  Got 300 from 
 https://127.0.0.1:17000/collection1lastClient and got 299 from 
 https://127.0.0.1:10132/collection1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5822) ChaosMonkeyNothingIsSafeTest.testDistribSearch fail, shard inconsistency, off by 1

2014-03-06 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5822:
--

Attachment: solr.logs

 ChaosMonkeyNothingIsSafeTest.testDistribSearch fail, shard inconsistency, off 
 by 1
 --

 Key: SOLR-5822
 URL: https://issues.apache.org/jira/browse/SOLR-5822
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0

 Attachments: solr.logs


  ChaosMonkeyNothingIsSafeTest.testDistribSearch 
  [exec][junit4] Throwable #1: java.lang.AssertionError: shard2 
 is not consistent.  Got 300 from 
 https://127.0.0.1:17000/collection1lastClient and got 299 from 
 https://127.0.0.1:10132/collection1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5487) Can we separate top scorer from sub scorer?


[ 
https://issues.apache.org/jira/browse/LUCENE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923119#comment-13923119
 ] 

ASF subversion and git services commented on LUCENE-5487:
-

Commit 1575057 from [~mikemccand] in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1575057 ]

LUCENE-5487: add TopScorers to FilteredQuery too; fix Solr; resolve all 
nocommits

 Can we separate top scorer from sub scorer?
 ---

 Key: LUCENE-5487
 URL: https://issues.apache.org/jira/browse/LUCENE-5487
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5487.patch, LUCENE-5487.patch


 This is just an exploratory patch ... still many nocommits, but I
 think it may be promising.
 I find the two booleans we pass to Weight.scorer confusing, because
 they really only apply to whoever will call score(Collector) (just
 IndexSearcher and BooleanScorer).
 The params are pointless for the vast majority of scorers, because
 very, very few query scorers really need to change how top-scoring is
 done, and those scorers can *only* score top-level (throw throw UOE
 from nextDoc/advance).  It seems like these two types of scorers
 should be separately typed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5487) Can we separate top scorer from sub scorer?