Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller
It’s going to haunt me if I don’t bring up Hossman. I don’t feel I have to
because who doesn’t know him.

He is a treasure that doesn’t spend much time on SolrCloud and has checked
out of leadership for the large part for reasons I won’t argue with.

Why doesn’t he do much with SolrCloud in a real way? I can only guess. He
will tell you it’s above his pay grade or some dumb shit.

IMO, it’s probably more that super thorough people try to be thorough with
SolrCloud and when you do that, it will poke your eye out with a stick. And
then throw you over a cliff.

Make it something he can work on more than tangentially.

Mark
-- 
- Mark

http://about.me/markrmiller


Re: Cursor mark page duplicates

2019-11-28 Thread Dwane Hall
Thanks Shawn, you are indeed correct these are NRT replicas! Thanks very much 
for the advice and possible resolutions. I went down the NRT path as in the 
past I've read advice from some of the Solr gurus recommending to use these 
replica types unless you have a very good reason not to. I do have basic auth 
enabled on my Solr cloud configuration and believe I can't use PULL replicas 
until the following JIRA is resolved 
(https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-11904) as 
Solr users the index replicator for this process. With this being the case I'll 
attempt your second suggestion and see how I go. Thanks again for taking the 
time to look at this it really was a confusing one to debug. Have a great 
weekend fellow Solr users and happy Solr-ing.

Dwane

From: Shawn Heisey 
Sent: Friday, 29 November 2019 4:51 AM
To: solr-user@lucene.apache.org 
Subject: Re: Cursor mark page duplicates

On 11/28/2019 1:30 AM, Dwane Hall wrote:
> I asked a question on the forum a couple of weeks ago regarding cursorMark 
> duplicates.  I initially thought it may be due to HDFSCaching because I was 
> unable replicate the issue on local indexes but unfortunately the dreaded 
> duplicates have returned!! For a refresher I was seeing what I thought was 
> duplicate documents appearing randomly on the last page of one cursor, and 
> the first page of the next.  So if rows=50 the duplicates are document 50 on 
> page 1 and document 1 on page 2.
>
> After further investigation I don't actually believe these documents are 
> duplicates but the same document being returned from a different replica on 
> each page.  After running a diff on the two documents the only difference is 
> the field "Solr_Update_Date" which I insert on each document as it is 
> inserted into the corpus.
>
> This is how the managed-schema mapping for this field looks
>
>  default="NOW" />
This can happen with SolrCloud using NRT replicas.  The default replica
type is NRT.  Based on the core names returned by the [shard] field in
your responses, it looks like you do have NRT replicas.

There are two solutions.  The better solution is to use
TimestampUpdateProcessorFactory for setting your timestamp field instead
of a default of NOW in the schema.  An alternate solution is to use
TLOG/PULL replica types instead of NRT -- that way replicas are
populated by copying exact index contents instead of independently indexing.

Thanks,
Shawn


Re: problem using Http2SolrClient with solr 8.3.0

2019-11-28 Thread Shawn Heisey

On 11/28/2019 9:30 AM, Odysci wrote:

No, I did nothing specific to Jetty. Should I?


The http/2 Solr client uses a different http client than the previous 
ones do.  It uses the client from Jetty, while the previous clients use 
the one from Apache.


Achieving http/2 with the Apache client would have required using a beta 
release, while the Jetty client has had http/2 in a GA release for three 
years.


The error message you're getting indicates that you have not included 
the Jetty client jar in your project.  Using a dependency manager should 
pull in all required dependencies.  If you're not using a dependency 
manager, you will find all the jars that you need in the dist/solrj-lib 
directory in the Solr download.


Thanks,
Shawn


Re: Cursor mark page duplicates

2019-11-28 Thread Shawn Heisey

On 11/28/2019 1:30 AM, Dwane Hall wrote:

I asked a question on the forum a couple of weeks ago regarding cursorMark 
duplicates.  I initially thought it may be due to HDFSCaching because I was 
unable replicate the issue on local indexes but unfortunately the dreaded 
duplicates have returned!! For a refresher I was seeing what I thought was 
duplicate documents appearing randomly on the last page of one cursor, and the 
first page of the next.  So if rows=50 the duplicates are document 50 on page 1 
and document 1 on page 2.

After further investigation I don't actually believe these documents are duplicates but 
the same document being returned from a different replica on each page.  After running a 
diff on the two documents the only difference is the field "Solr_Update_Date" 
which I insert on each document as it is inserted into the corpus.

This is how the managed-schema mapping for this field looks


This can happen with SolrCloud using NRT replicas.  The default replica 
type is NRT.  Based on the core names returned by the [shard] field in 
your responses, it looks like you do have NRT replicas.


There are two solutions.  The better solution is to use 
TimestampUpdateProcessorFactory for setting your timestamp field instead 
of a default of NOW in the schema.  An alternate solution is to use 
TLOG/PULL replica types instead of NRT -- that way replicas are 
populated by copying exact index contents instead of independently indexing.


Thanks,
Shawn


Re: problem using Http2SolrClient with solr 8.3.0

2019-11-28 Thread Odysci
No, I did nothing specific to Jetty. Should I?
Thx

On Wed, Nov 27, 2019 at 6:54 PM Houston Putman 
wrote:

> Are you overriding the Jetty version in your application using SolrJ?
>
> On Wed, Nov 27, 2019 at 4:00 PM Odysci  wrote:
>
> > Hi,
> > I have a solr cloud setup using solr 8.3 and SolrJj, which works fine
> using
> > the HttpSolrClient as well as the CloudSolrClient. I use 2 solr nodes
> with
> > 3 Zookeeper nodes.
> > Recently I configured my machines to handle ssl, http/2 and then I tried
> > using in my java code the Http2SolrClient supported by SolrJ 8.3.0, but I
> > got the following error at run time upon instantiating the
> Http2SolrClient
> > object:
> >
> > Has anyone seen this problem?
> > Thanks
> > Reinaldo
> > ===
> >
> > Oops: NoClassDefFoundError
> > Unexpected error : Unexpected Error, caused by exception
> > NoClassDefFoundError: org/eclipse/jetty/client/api/Request
> >
> > play.exceptions.UnexpectedException: Unexpected Error
> > at play.jobs.Job.onException(Job.java:180)
> > at play.jobs.Job.call(Job.java:250)
> > at Invocation.Job(Play!)
> > Caused by: java.lang.NoClassDefFoundError:
> > org/eclipse/jetty/client/api/Request
> > at
> >
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient$AsyncTracker.(Http2SolrClient.java:789)
> > at
> >
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.(Http2SolrClient.java:131)
> > at
> >
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:833)
> > ... more
> > Caused by: java.lang.ClassNotFoundException:
> > org.eclipse.jetty.client.api.Request
> > at
> >
> >
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
> > at
> >
> >
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> > ... 16 more
> > ==
> >
>


Re: problem using Http2SolrClient with solr 8.3.0

2019-11-28 Thread Odysci
I'm using OpenJDK 11

On Wed, Nov 27, 2019 at 7:12 PM Jörn Franke  wrote:

> Which jdk version? In this Setting i would recommend JDK11.
>
> > Am 27.11.2019 um 22:00 schrieb Odysci :
> >
> > Hi,
> > I have a solr cloud setup using solr 8.3 and SolrJj, which works fine
> using
> > the HttpSolrClient as well as the CloudSolrClient. I use 2 solr nodes
> with
> > 3 Zookeeper nodes.
> > Recently I configured my machines to handle ssl, http/2 and then I tried
> > using in my java code the Http2SolrClient supported by SolrJ 8.3.0, but I
> > got the following error at run time upon instantiating the
> Http2SolrClient
> > object:
> >
> > Has anyone seen this problem?
> > Thanks
> > Reinaldo
> > ===
> >
> > Oops: NoClassDefFoundError
> > Unexpected error : Unexpected Error, caused by exception
> > NoClassDefFoundError: org/eclipse/jetty/client/api/Request
> >
> > play.exceptions.UnexpectedException: Unexpected Error
> > at play.jobs.Job.onException(Job.java:180)
> > at play.jobs.Job.call(Job.java:250)
> > at Invocation.Job(Play!)
> > Caused by: java.lang.NoClassDefFoundError:
> > org/eclipse/jetty/client/api/Request
> > at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient$AsyncTracker.(Http2SolrClient.java:789)
> > at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.(Http2SolrClient.java:131)
> > at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:833)
> > ... more
> > Caused by: java.lang.ClassNotFoundException:
> > org.eclipse.jetty.client.api.Request
> > at
> >
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
> > at
> >
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> > ... 16 more
> > ==
>


Re: Solr Paryload example

2019-11-28 Thread Vincenzo D'Amore
Hi all,

I’ve prepared the pull request and submitted the issue.

https://issues.apache.org/jira/browse/SOLR-13863


If anyone is interested in the feature, please write your opinion the
thread.

Ciao,
Vincenzo

--
mobile: 3498513251
skype: free.dev

On 24 Oct 2019, at 13:59, Vincenzo D'Amore  wrote:


Hi all,

just to let you know that we started using spayload function in our quality
environment for testing.
Within a couple of weeks the feature will be deployed in production.

Best regards,
Vincenzo

On Wed, Oct 23, 2019 at 4:31 PM Vincenzo D'Amore  wrote:

> Hi Erick, yes, absolutely, it's a great pleasure for me contribute.
>
> On Wed, Oct 23, 2019 at 2:25 PM Erick Erickson 
> wrote:
>
>> Bookmarked. Do you intend that this should be incorporated into Solr? If
>> so, please raise a JIRA and link your PR in….
>>
>> Thanks!
>> Erick
>>
>> > On Oct 22, 2019, at 6:56 PM, Vincenzo D'Amore 
>> wrote:
>> >
>> > Hi all,
>> >
>> > this evening I had some spare hour to spend in order to put everything
>> > together in a repository.
>> >
>> > https://github.com/freedev/solr-payload-string-function-query
>> >
>> >
>> >
>> > On Tue, Oct 22, 2019 at 5:54 PM Vincenzo D'Amore 
>> wrote:
>> >
>> >> Hi all,
>> >>
>> >> thanks for supporting. And many thanks whom have implemented
>> >> the integration of the github Solr repository with the intellij IDE.
>> >> To configure the environment and run the debugger I spent less than one
>> >> hour, (and most of the time I had to wait the compilation).
>> >> Solr and you guys really rocks together.
>> >>
>> >> What I've done:
>> >>
>> >> I was looking at the original payload function is defined into
>> >> the ValueSourceParser, this function uses a FloatPayloadValueSource to
>> >> return the value found.
>> >>
>> >> As said I wrote a new version of payload function that handles
>> strings, I
>> >> named it spayload, and basically is able to extract the string value
>> from
>> >> the payload.
>> >>
>> >> Given the former example where I have a multivalue field
>> payloadCurrency
>> >>
>> >> payloadCurrency: [
>> >> "store1|USD",
>> >> "store2|EUR",
>> >> "store3|GBP"
>> >> ]
>> >>
>> >> executing spayload(payloadCurrency,store2) returns "EUR", and so on for
>> >> the remaining key/value in the field.
>> >>
>> >> To implement the spayload function, I've added a new ValueSourceParser
>> >> instance to the list of defined functions and which returns
>> >> a StringPayloadValueSource with the value inside (does the same thing
>> of
>> >> former FloatPayloadValueSource).
>> >>
>> >> That's all. As said, always beware of your code when works at first
>> run.
>> >> And really there was something wrong, initially I messed up in the
>> >> conversion of the payload into String (bytes, offset, etc).
>> >> Now it is fixed, or at least it seems to me.
>> >> I see this function cannot be used in the sort, very likely the simple
>> >> implementation of the StringPayloadValueSource miss something.
>> >>
>> >> As far as I understand I'm scratching the surface of this solution,
>> there
>> >> are few things I'm worried about. I have a bunch of questions, please
>> be
>> >> patient.
>> >> This function returns an empty string "" when does not match any key,
>> or
>> >> should return an empty value? not sure about, what's the correct way to
>> >> return an empty value?
>> >> I wasn't able to find a test unit for the payload function in the
>> tests.
>> >> Could you give me few suggestion in order to test properly the
>> >> implementation?
>> >> In case the spayload is used on a different field type (i.e. the use
>> >> spayload on a float payload) the behaviour is not handled. Can this
>> >> function check the type of the payload content?
>> >> And at last, what do you think, can this simple fix be interesting for
>> the
>> >> Solr community, may I try to submit a pull request or add a feature to
>> JIRA?
>> >>
>> >> Best regards,
>> >> Vincenzo
>> >>
>> >>
>> >> On Mon, Oct 21, 2019 at 9:12 PM Erik Hatcher 
>> >> wrote:
>> >>
>> >>> Yes.   The decoding of a payload based on its schema type is what the
>> >>> payload() function does.   Your Payloader won't currently work
>> well/legibly
>> >>> for fields encoded numerically:
>> >>>
>> >>>
>> >>>
>> https://github.com/o19s/payload-component/blob/master/src/main/java/com/o19s/payloads/Payloader.java#L130
>> >>> <
>> >>>
>> https://github.com/o19s/payload-component/blob/master/src/main/java/com/o19s/payloads/Payloader.java#L130
>> 
>> >>>
>> >>> I think that code could probably be slightly enhanced to leverage
>> >>> PayloadUtils.getPayloadDecoder(fieldType) and use bytes if the field
>> type
>> >>> doesn't have a better decoder.
>> >>>
>> >>>Erik
>> >>>
>> >>>
>>  On Oct 21, 2019, at 2:55 PM, Eric Pugh <
>> ep...@opensourceconnections.com>
>> >>> wrote:
>> 
>>  Have you checked out
>> 

Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller
I’m including this response to a private email because it’s not something
I’ve brought up and I also think it’s a critical note:

“Yes. That is our biggest advantage. Being Apache. Almost no one seems to
be employed to help other contributors get their work in at the right
level, and all the money has ensured the end of the hobbyist. I hope that
changes too.”

-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller
Yes. That is our biggest advantage. Being Apache. Almost no one seems to be
employed to help other contributors get their work in at the right level,
and all the money has ensured the end of the hobbyist. I hope that changes
too.

Thanks for the note.

Mark

On Thu, Nov 28, 2019 at 1:55 PM Paras Lehana 
wrote:

> Hey Mark,
>
> I was actually expecting (and wanting) this after your LinkedIn post.
>
> At this point, the best way to use Solr is as it’s always been - avoid
>> SolrCloud and setup your own system in standalone mode.
>
>
> That's what I have been telling people who are just getting started with
> Solr and thinking that SolrCloud is actually something superior to the
> standalone mode. That may depend on the use case, but for me, I always
> prefer to achieve things from standalone perspective instead of investing
> my time over switching to Cloud.
>
> I handle Auto-Suggest at IndiaMART. We have over 60 million docs. Single
> server of *standalone* Solr is capable of handling 800 req/sec. In fact,
> on production, we get ~300 req/sec and the single Solr is still able to
> provide responses within 25 ms!
>
> Anyways, I don't think that the project was a failure. All these were the
> small drops of the big Solr Ocean. We, the community and you, tried, we
> tested and we are still here as the open community of one of the most
> powerful search platforms. SolrCloud was also needed to be introduced at
> some time. Notwithstanding, I do think that the project needs to be more
> open with community commits. The community and open-sourceness of Solr is
> what I used to love over those of ElasticSearch's.
>
> Anyways, keep rocking! You have already left your footprints into the
> history of this beast project. 落
>
> On Thu, 28 Nov 2019 at 09:10, Mark Miller  wrote:
>
>> Now one company thinks I’m after them because they were the main source of
>> the jokes.
>>
>> Companies is not a typo.
>>
>> If you are using Solr to make or save tons of money or run your business
>> and you employee developers, please include yourself in this list.
>>
>> You are taking and in my opinion Solr is going down. It’s all against your
>> own interest even.
>>
>> I know of enough people that want to solve this now, that it’s likely only
>> a matter of time before they fix the situation - you ever know though.
>> Things change, people get new jobs, jobs change. It will take at least 3-6
>> months to make things reasonable even with a good group banding together.
>>
>> But if you are extracting value from this project and have Solr developers
>> - id like to think you have enough of a stake in this to think about
>> changing the approach everyone has been taking. It’s not working, and the
>> longer it goes on, the harder it’s getting to fix things.
>>
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
>
> 
>
-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller
The people I have identified that I have the most faith in to lead the
fixing of Solr are Ishan, Noble and David. I encourage you all to look at
and follow and join in their leadership.

You can do this.


Mark
-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-28 Thread Paras Lehana
Hey Mark,

I was actually expecting (and wanting) this after your LinkedIn post.

At this point, the best way to use Solr is as it’s always been - avoid
> SolrCloud and setup your own system in standalone mode.


That's what I have been telling people who are just getting started with
Solr and thinking that SolrCloud is actually something superior to the
standalone mode. That may depend on the use case, but for me, I always
prefer to achieve things from standalone perspective instead of investing
my time over switching to Cloud.

I handle Auto-Suggest at IndiaMART. We have over 60 million docs. Single
server of *standalone* Solr is capable of handling 800 req/sec. In fact, on
production, we get ~300 req/sec and the single Solr is still able to
provide responses within 25 ms!

Anyways, I don't think that the project was a failure. All these were the
small drops of the big Solr Ocean. We, the community and you, tried, we
tested and we are still here as the open community of one of the most
powerful search platforms. SolrCloud was also needed to be introduced at
some time. Notwithstanding, I do think that the project needs to be more
open with community commits. The community and open-sourceness of Solr is
what I used to love over those of ElasticSearch's.

Anyways, keep rocking! You have already left your footprints into the
history of this beast project. 落

On Thu, 28 Nov 2019 at 09:10, Mark Miller  wrote:

> Now one company thinks I’m after them because they were the main source of
> the jokes.
>
> Companies is not a typo.
>
> If you are using Solr to make or save tons of money or run your business
> and you employee developers, please include yourself in this list.
>
> You are taking and in my opinion Solr is going down. It’s all against your
> own interest even.
>
> I know of enough people that want to solve this now, that it’s likely only
> a matter of time before they fix the situation - you ever know though.
> Things change, people get new jobs, jobs change. It will take at least 3-6
> months to make things reasonable even with a good group banding together.
>
> But if you are extracting value from this project and have Solr developers
> - id like to think you have enough of a stake in this to think about
> changing the approach everyone has been taking. It’s not working, and the
> longer it goes on, the harder it’s getting to fix things.
>
>
> --
> - Mark
>
> http://about.me/markrmiller
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 


Re: Cursor mark page duplicates

2019-11-28 Thread Dwane Hall
Hey guys,

I asked a question on the forum a couple of weeks ago regarding cursorMark 
duplicates.  I initially thought it may be due to HDFSCaching because I was 
unable replicate the issue on local indexes but unfortunately the dreaded 
duplicates have returned!! For a refresher I was seeing what I thought was 
duplicate documents appearing randomly on the last page of one cursor, and the 
first page of the next.  So if rows=50 the duplicates are document 50 on page 1 
and document 1 on page 2.

After further investigation I don't actually believe these documents are 
duplicates but the same document being returned from a different replica on 
each page.  After running a diff on the two documents the only difference is 
the field "Solr_Update_Date" which I insert on each document as it is inserted 
into the corpus.

This is how the managed-schema mapping for this field looks






The only sort parameter is the id field

"sort":"id desc"

rows=50


Here are the results




Document 50 on page 1 is



{

  "responseHeader":{

"zkConnected":true,

"status":0,

"QTime":8,

"params":{

  "q":"id:\"2019-10-29 15:15:36.748052\"",

  "fl":"id,_version_,[shard],Solr_Update_Date",

  "_":"1574900506126"}},

  "response":{"numFound":1,"start":0,"maxScore":7.312953,"docs":[

  {

"id":"2019-10-29 15:15:36.748052",

"Solr_Update_Date":"2019-11-01T00:15:07.811Z",

"_version_":1648956337338449920,


"[shard]":"https://solrHost:9021/solr/my_collection_shard4_replica_n14/|https://solrHost:9022/solr/my_collection_shard4_replica_n12/"}]

  }}



Document 1 on page 2 is


{

  "responseHeader":{

"zkConnected":true,

"status":0,

"QTime":7,

"params":{

  "q":"id:\"2019-10-29 15:15:36.748052\"",

  "fl":"id,_version_,[shard],Solr_Update_Date",

  "_":"1574900506126"}},

  "response":{"numFound":1,"start":0,"maxScore":7.822712,"docs":[

  {

"id":"2019-10-29 15:15:36.748052",

"Solr_Update_Date":"2019-11-01T00:15:07.794Z",

"_version_":1648956337338449920,


"[shard]":"https://solrHost:9022/solr/my_collection_shard4_replica_n12/|https://solrHost:9021/solr/my_collection_shard4_replica_n14/"}]

  }}


As you can see both documents have the same version number but different 
maxScores and Solr_Update_Date's.  My understanding is the cursorMark should 
only be generated off the id field so I can't see why I would get a different 
document from a different shard at the end of one page, and the beginning of 
the next? Would anyone have any insight into this behaviour as this happens 
randomly on page boundaries when paging through results.

Thanks for your time

Dwane



From: Dwane Hall 
Sent: Monday, 11 November 2019 10:10 PM
To: solr-user@lucene.apache.org 
Subject: Re: Cursor mark page duplicates

Thanks Erick/Hossman,

I appreciate your input it's always an interesting read seeing Solr legends 
like yourselves work through a problem!  I certainly learn a lot from following 
your responses in this user group.

As you recommended I ran the distrib=false query on each shard and the results 
were the identical in both instances.  Below is a snapshot from the admin ui 
showing the details of each shard which all looks in order to me (other than 
our large number of deletes in the corpus ...we have quite a dynamic 
environment when the index is live)


Last Modified:23 days ago

Num Docs:47247895

Max Doc:68108804

Heap Memory Usage:-1

Deleted Docs:20860909

Version:528038

Segment Count:41



Master (Searching) Version:1571148411550 Gen:25528 Size:42.56 GB

Master (Replicable) Version:1571153302013 Gen:25529



Last Modified:23 days ago

Num Docs:47247895

Max Doc:68223647

Heap Memory Usage:-1

Deleted Docs:20975752

Version:526613

Segment Count:43



Master (Searching) Version:1571148411615 Gen:25527 Size:42.63 GB

Master (Replicable) Version:1571153302076 Gen:25528

I was however able to replicate the issue but under unusual circumstances with 
some crude in browser testing.  If I use a cursorMark other than "*" and 
constantly re-run the query (just resubmitting the url in a browser with the 
same cursor and query) the first result on the page toggles between the 
expected value, and the last item from the previous page.  So if rows=50, page 
2 toggles between result 51 (expected) and result 50 (the last item from the 
previous page).  It doesn't happen all the time but every one in five or so 
refreshes I'm able to replicate it consistently (and on every subsequent 
cursor).

I failed to mention in my original email that we use the HdfsDirectoryFactory 
to store our indexes in HDFS.  This configuration uses an off heap block cache 
to cache HDFS blocks in memory as it is unable to take advantage of the OS disk 
cache.  I mention this as we're currently in the process of switching to local 
disk and I've been unable to replicate the issue when using the local