Shawn had an interesting idea on another thread. It depends
on having basically an identity field (which I see how to do
manually, but don't see how to make work as a new field type
in a distributed environment). And it's brilliantly simple, just
a range query identity:{ TO *]sort=identity
Otis,
You gave links to 'deep paging' when I asked about response streaming.
Let me understand. From my POV, deep paging is a special case for regular
search scenarios. We definitely need it in Solr. However, if we are talking
about data analytic like problems, when we need to select an endless
Mikhail,
If your solution gives lazy loading of solr docs /and thus streaming of
huge result lists/ it should be big YES!
Roman
On 27 Jul 2013 07:55, Mikhail Khludnev mkhlud...@griddynamics.com wrote:
Otis,
You gave links to 'deep paging' when I asked about response streaming.
Let me
Roman,
Let me briefly explain the design
special RequestParser stores servlet output stream into the context
https://github.com/m-khl/solr-patches/compare/streaming#L7R22
then special component injects special PostFilter/DelegatingCollector which
writes right into output
Hi Mikhail,
I can see it is lazy-loading, but I can't judge how much complex it becomes
(presumably, the filter dispatching mechanism is doing also other things -
it is there not only for streaming).
Let me just explain better what I found when I dug inside solr: documents
(results of the query)
On Sat, Jul 27, 2013 at 4:30 PM, Roman Chyla roman.ch...@gmail.com wrote:
Let me just explain better what I found when I dug inside solr: documents
(results of the query) are loaded before they are passed into a writer - so
the writers are expecting to encounter the solr documents, but these
Hello,
Please find below
Let me just explain better what I found when I dug inside solr: documents
(results of the query) are loaded before they are passed into a writer - so
the writers are expecting to encounter the solr documents, but these
documents were loaded by one of the components
On Sat, Jul 27, 2013 at 5:05 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
anyway, even if writer pulls docs one by one, it doesn't allow to stream a
billion of them. Solr writes out DocList, which is really problematic even
in deep-paging scenarios.
Which part is problematic... the
On Sun, Jul 28, 2013 at 1:25 AM, Yonik Seeley yo...@lucidworks.com wrote:
Which part is problematic... the creation of the DocList (the search),
Literally DocList is a copy of TopDocs. Creating TopDocs is not a search,
but ranking.
And ranking costs is log(rows+start) beside of numFound, which
Mikhail,
Yes, +1.
This question comes up a few times a year. Grant created a JIRA issue
for this many moons ago.
https://issues.apache.org/jira/browse/LUCENE-2127
https://issues.apache.org/jira/browse/SOLR-1726
Otis
--
Solr ElasticSearch Support -- http://sematext.com/
Performance Monitoring
Roman,
Can you disclosure how that streaming writer works? What does it stream
docList or docSet?
Thanks
On Wed, Jul 24, 2013 at 5:57 AM, Roman Chyla roman.ch...@gmail.com wrote:
Hello Matt,
You can consider writing a batch processing handler, which receives a query
and instead of sending
Mikhail,
It is a slightly hacked JSONWriter - actually, while poking around, I have
discovered that dumping big hitsets would be possible - the main hurdle
right now, is that writer is expecting to receive docuemnts with fields
loaded, but if it received something that loads docs lazily, you could
On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber mlie...@impetus.com wrote:
That sounds like a satisfactory solution for the time being -
I am assuming you dump the data from Solr in a csv format?
JSON
How did you implement the streaming processor ? (what tool did you use for
this? Not
fwiw,
i did some prototype with the following differences:
- it streams straight to the socket output stream
- it streams on-going during collecting, without necessity to store a
bitset.
It might have some limited extreme usage. Is there anyone interested?
On Wed, Jul 24, 2013 at 7:19 PM, Roman
: Subject: Processing a lot of results in Solr
: Message-ID: d57c2b719b792f428beca7b0096c88e22c0...@mail1.impetus.co.in
: In-Reply-To: 1374612243070-4079869.p...@n3.nabble.com
https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a
Hello Solr users,
Question regarding processing a lot of docs returned from a query; I
potentially have millions of documents returned back from a query. What is
the common design to deal with this ?
2 ideas I have are:
- create a client service that is multithreaded to handled this
- Use the
Hi Matt,
This feature is commonly known as deep paging and Lucene and Solr have
issues with it ... take a look at
http://solr.pl/en/2011/07/18/deep-paging-problem/ as a potential
starting point using filters to bucketize a result set into sets of
sub result sets.
Cheers,
Tim
On Tue, Jul 23,
Hello Matt,
You can consider writing a batch processing handler, which receives a query
and instead of sending results back, it writes them into a file which is
then available for streaming (it has its own UUID). I am dumping many GBs
of data from solr in few minutes - your query + streaming
That sounds like a satisfactory solution for the time being -
I am assuming you dump the data from Solr in a csv format?
How did you implement the streaming processor ? (what tool did you use for
this? Not familiar with that)
You say it takes a few minutes only to dump the data - how long does it
19 matches
Mail list logo