Re: The Streaming API (Solrj.io) : id must have DocValues?

sudsport s Tue, 26 Apr 2016 11:30:06 -0700

I see that some work was done to remove stream handler form config. so
enabling stream handler is still security issue?


https://issues.apache.org/jira/browse/SOLR-8262

On Tue, Apr 26, 2016 at 11:14 AM, sudsport s <sudssf2...@gmail.com> wrote:

> I am using solr 5.3.1 server & solr5.5 on client ( solrj) . I will try
> with solrj 6.0
>
> On Tue, Apr 26, 2016 at 11:12 AM, Susmit Shukla <shukla.sus...@gmail.com>
> wrote:
>
>> Which solrj version are you using? could you try with solrj 6.0
>>
>> On Tue, Apr 26, 2016 at 10:36 AM, sudsport s <sudssf2...@gmail.com>
>> wrote:
>>
>> > @Joel
>> > >Can you describe how you're planning on using Streaming?
>> >
>> > I am mostly using it for distirbuted join case. We were planning to use
>> > similar logic (hash id and join) in Spark for our usecase. but since
>> data
>> > is stored in solr , I will be using solr stream to perform same
>> operation.
>> >
>> > I have similar user cases to build probabilistic data-structures while
>> > streaming results. I might have to spend some time in exploring query
>> > optimization (while doing join decide sort order etc)
>> >
>> > Please let me know if you have any feedback.
>> >
>> > On Tue, Apr 26, 2016 at 10:30 AM, sudsport s <sudssf2...@gmail.com>
>> wrote:
>> >
>> > > Thanks @Reth yes that was my one of the concern. I will look at JIRA
>> you
>> > > mentioned.
>> > >
>> > > Thanks Joel
>> > > I used some of examples for streaming client from your blog. I got
>> basic
>> > > tuple stream working but I get following exception while running
>> parallel
>> > > string.
>> > >
>> > >
>> > > java.io.IOException: java.util.concurrent.ExecutionException:
>> > > org.noggit.JSONParser$ParseException: JSON Parse Error:
>> char=<,position=0
>> > > BEFORE='<' AFTER='html> <head> <meta http-equiv="Content-'
>> > > at
>> > >
>> >
>> org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:332)
>> > > at
>> > >
>> >
>> org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:231)
>> > >
>> > >
>> > >
>> > > I tried to look into solr logs but after turning on debug mode I found
>> > > following
>> > > POST /solr/collection_shard20_replica1/stream HTTP/1.1
>> > > "HTTP/1.1 404 Not Found[\r][\n]"
>> > >
>> > >
>> > > looks like Parallel stream is trying to access /stream on shard. can
>> > > someone tell me how to enable stream handler? I have export handler
>> > > enabled. I will look at latest solrconfig to see if I can turn that
>> on.
>> > >
>> > >
>> > >
>> > > @Joel I am running sizing exercises already , I will run new one with
>> > > solr5.5+ and docValues on id enabled.
>> > >
>> > > BTW Solr streaming has amazing response times thanks for making it so
>> > > FAST!!!
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Apr 25, 2016 at 10:54 AM, Joel Bernstein <joels...@gmail.com>
>> > > wrote:
>> > >
>> > >> Can you describe how you're planning on using Streaming? I can
>> provide
>> > >> some
>> > >> feedback on how it will perform for your use use.
>> > >>
>> > >> When scaling out Streaming you'll get large performance boosts when
>> you
>> > >> increase the number of shards, replicas and workers. This is
>> > particularly
>> > >> true if you're doing parallel relational algebra or map/reduce
>> > operations.
>> > >>
>> > >> As far a DocValues being expensive with unique fields, you'll want to
>> > do a
>> > >> sizing exercise to see how many documents per-shard work best for
>> your
>> > use
>> > >> case. There are different docValues implementations that will allow
>> you
>> > to
>> > >> trade off memory for performance.
>> > >>
>> > >> Joel Bernstein
>> > >> http://joelsolr.blogspot.com/
>> > >>
>> > >> On Mon, Apr 25, 2016 at 3:30 AM, Reth RM <reth.ik...@gmail.com>
>> wrote:
>> > >>
>> > >> > Hi,
>> > >> >
>> > >> > So, is the concern related to same field value being stored twice:
>> > with
>> > >> > stored=true and docValues=true? If that is the case, there is a
>> jira
>> > >> > relevant to this, fixed[1]. If you upgrade to 5.5/6.0 version, it
>> is
>> > >> > possible to read non-stored fields from docValues index., check
>> out.
>> > >> >
>> > >> >
>> > >> > [1] https://issues.apache.org/jira/browse/SOLR-8220
>> > >> >
>> > >> > On Mon, Apr 25, 2016 at 9:44 AM, sudsport s <sudssf2...@gmail.com>
>> > >> wrote:
>> > >> >
>> > >> > > Thanks Erik for reply,
>> > >> > >
>> > >> > > Since I was storing Id (its stored field) and after enabling
>> > >> docValues my
>> > >> > > guess is it will be stored in 2 places. also as per my
>> understanding
>> > >> > > docValues are great when you have values which repeat. I am not
>> sure
>> > >> how
>> > >> > > beneficial it would be for uniqueId field.
>> > >> > > I am looking at collection of few hundred billion documents ,
>> that
>> > is
>> > >> > > reason I really want to care about expense from design phase.
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > On Sun, Apr 24, 2016 at 7:24 PM, Erick Erickson <
>> > >> erickerick...@gmail.com
>> > >> > >
>> > >> > > wrote:
>> > >> > >
>> > >> > > > In a word, "yes".
>> > >> > > >
>> > >> > > > DocValues aren't particularly expensive, or expensive at all.
>> The
>> > >> idea
>> > >> > > > is that when you sort by a field or facet, the field has to be
>> > >> > > > "uninverted" which builds the entire structure in Java's JVM
>> (this
>> > >> is
>> > >> > > > when the field is _not_ DocValues).
>> > >> > > >
>> > >> > > > DocValues essentially serialize this structure to disk. So your
>> > >> > > > on-disk index size is larger, but that size is MMaped rather
>> than
>> > >> > > > stored on Java's heap.
>> > >> > > >
>> > >> > > > Really, the question I'd have to ask though is "why do you care
>> > >> about
>> > >> > > > the expense?". If you have a functional requirement that has
>> to be
>> > >> > > > served by returning the id via the /export handler, you really
>> > have
>> > >> no
>> > >> > > > choice.
>> > >> > > >
>> > >> > > > Best,
>> > >> > > > Erick
>> > >> > > >
>> > >> > > >
>> > >> > > > On Sun, Apr 24, 2016 at 9:55 AM, sudsport s <
>> sudssf2...@gmail.com
>> > >
>> > >> > > wrote:
>> > >> > > > > I was trying to use Streaming for reading basic tuple
>> stream. I
>> > am
>> > >> > > using
>> > >> > > > > sort by id asc ,
>> > >> > > > > I am getting following exception
>> > >> > > > >
>> > >> > > > > I am using export search handler as per
>> > >> > > > >
>> > >> >
>> > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
>> > >> > > > >
>> > >> > > > > null:java.io.IOException: id must have DocValues to use this
>> > >> feature.
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.solr.response.SortingResponseWriter.getFieldWriters(SortingResponseWriter.java:241)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.solr.response.SortingResponseWriter.write(SortingResponseWriter.java:120)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:53)
>> > >> > > > >         at
>> > >> > > >
>> > >> >
>> > >>
>> >
>> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:742)
>> > >> > > > >         at
>> > >> > > >
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:471)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>> > >> > > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>> > >> > > > >         at
>> > >> > > >
>> > >> >
>> > >>
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>> > >> > > > >         at
>> > >> org.eclipse.jetty.server.session.SessionHandler.doScope(
>> > >> > > > >
>> > >> > > > >
>> > >> > > > > does it make sense to enable docValues for unique field? How
>> > >> > expensive
>> > >> > > > is it?
>> > >> > > > >
>> > >> > > > >
>> > >> > > > > if I have existing collection can I update schema and
>> optimize
>> > >> > > > > collection to get docvalues enabled for id?
>> > >> > > > >
>> > >> > > > >
>> > >> > > > > --
>> > >> > > > >
>> > >> > > > > Thanks
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: The Streaming API (Solrj.io) : id must have DocValues?

Reply via email to