Shawn had an interesting idea on another thread. It depends
on having basically an identity field (which I see how to do
manually, but don't see how to make work as a new field type
in a distributed environment). And it's brilliantly simple, just
a range query identity:{ TO *]&sort=identity asc
On Sun, Jul 28, 2013 at 1:25 AM, Yonik Seeley wrote:
>
> Which part is problematic... the creation of the DocList (the search),
>
Literally DocList is a copy of TopDocs. Creating TopDocs is not a search,
but ranking.
And ranking costs is log(rows+start) beside of numFound, which the search
takes.
On Sat, Jul 27, 2013 at 5:05 PM, Mikhail Khludnev
wrote:
> anyway, even if writer pulls docs one by one, it doesn't allow to stream a
> billion of them. Solr writes out DocList, which is really problematic even
> in deep-paging scenarios.
Which part is problematic... the creation of the DocList (
Hello,
Please find below
> Let me just explain better what I found when I dug inside solr: documents
> (results of the query) are loaded before they are passed into a writer - so
> the writers are expecting to encounter the solr documents, but these
> documents were loaded by one of the componen
On Sat, Jul 27, 2013 at 4:30 PM, Roman Chyla wrote:
> Let me just explain better what I found when I dug inside solr: documents
> (results of the query) are loaded before they are passed into a writer - so
> the writers are expecting to encounter the solr documents, but these
> documents were load
Hi Mikhail,
I can see it is lazy-loading, but I can't judge how much complex it becomes
(presumably, the filter dispatching mechanism is doing also other things -
it is there not only for streaming).
Let me just explain better what I found when I dug inside solr: documents
(results of the query)
Roman,
Let me briefly explain the design
special RequestParser stores servlet output stream into the context
https://github.com/m-khl/solr-patches/compare/streaming#L7R22
then special component injects special PostFilter/DelegatingCollector which
writes right into output
https://github.com/m-kh
Mikhail,
If your solution gives lazy loading of solr docs /and thus streaming of
huge result lists/ it should be big YES!
Roman
On 27 Jul 2013 07:55, "Mikhail Khludnev" wrote:
> Otis,
> You gave links to 'deep paging' when I asked about response streaming.
> Let me understand. From my POV, deep p
Otis,
You gave links to 'deep paging' when I asked about response streaming.
Let me understand. From my POV, deep paging is a special case for regular
search scenarios. We definitely need it in Solr. However, if we are talking
about data analytic like problems, when we need to select an "endless"
s
Mikhail,
Yes, +1.
This question comes up a few times a year. Grant created a JIRA issue
for this many moons ago.
https://issues.apache.org/jira/browse/LUCENE-2127
https://issues.apache.org/jira/browse/SOLR-1726
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring
: Subject: Processing a lot of results in Solr
: Message-ID:
: In-Reply-To: <1374612243070-4079869.p...@n3.nabble.com>
https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an ex
fwiw,
i did some prototype with the following differences:
- it streams straight to the socket output stream
- it streams on-going during collecting, without necessity to store a
bitset.
It might have some limited extreme usage. Is there anyone interested?
On Wed, Jul 24, 2013 at 7:19 PM, Roman C
On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber wrote:
> That sounds like a satisfactory solution for the time being -
> I am assuming you dump the data from Solr in a csv format?
>
JSON
> How did you implement the streaming processor ? (what tool did you use for
> this? Not familiar with that)
Mikhail,
It is a slightly hacked JSONWriter - actually, while poking around, I have
discovered that dumping big hitsets would be possible - the main hurdle
right now, is that writer is expecting to receive docuemnts with fields
loaded, but if it received something that loads docs lazily, you could
Roman,
Can you disclosure how that streaming writer works? What does it stream
docList or docSet?
Thanks
On Wed, Jul 24, 2013 at 5:57 AM, Roman Chyla wrote:
> Hello Matt,
>
> You can consider writing a batch processing handler, which receives a query
> and instead of sending results back, it
That sounds like a satisfactory solution for the time being -
I am assuming you dump the data from Solr in a csv format?
How did you implement the streaming processor ? (what tool did you use for
this? Not familiar with that)
You say it takes a few minutes only to dump the data - how long does it t
Hello Matt,
You can consider writing a batch processing handler, which receives a query
and instead of sending results back, it writes them into a file which is
then available for streaming (it has its own UUID). I am dumping many GBs
of data from solr in few minutes - your query + streaming write
Hi Matt,
This feature is commonly known as deep paging and Lucene and Solr have
issues with it ... take a look at
http://solr.pl/en/2011/07/18/deep-paging-problem/ as a potential
starting point using filters to bucketize a result set into sets of
sub result sets.
Cheers,
Tim
On Tue, Jul 23, 2013
Hello Solr users,
Question regarding processing a lot of docs returned from a query; I
potentially have millions of documents returned back from a query. What is
the common design to deal with this ?
2 ideas I have are:
- create a client service that is multithreaded to handled this
- Use the Sol
19 matches
Mail list logo