I see that some work was done to remove stream handler form config. so enabling stream handler is still security issue?
https://issues.apache.org/jira/browse/SOLR-8262 On Tue, Apr 26, 2016 at 11:14 AM, sudsport s <sudssf2...@gmail.com> wrote: > I am using solr 5.3.1 server & solr5.5 on client ( solrj) . I will try > with solrj 6.0 > > On Tue, Apr 26, 2016 at 11:12 AM, Susmit Shukla <shukla.sus...@gmail.com> > wrote: > >> Which solrj version are you using? could you try with solrj 6.0 >> >> On Tue, Apr 26, 2016 at 10:36 AM, sudsport s <sudssf2...@gmail.com> >> wrote: >> >> > @Joel >> > >Can you describe how you're planning on using Streaming? >> > >> > I am mostly using it for distirbuted join case. We were planning to use >> > similar logic (hash id and join) in Spark for our usecase. but since >> data >> > is stored in solr , I will be using solr stream to perform same >> operation. >> > >> > I have similar user cases to build probabilistic data-structures while >> > streaming results. I might have to spend some time in exploring query >> > optimization (while doing join decide sort order etc) >> > >> > Please let me know if you have any feedback. >> > >> > On Tue, Apr 26, 2016 at 10:30 AM, sudsport s <sudssf2...@gmail.com> >> wrote: >> > >> > > Thanks @Reth yes that was my one of the concern. I will look at JIRA >> you >> > > mentioned. >> > > >> > > Thanks Joel >> > > I used some of examples for streaming client from your blog. I got >> basic >> > > tuple stream working but I get following exception while running >> parallel >> > > string. >> > > >> > > >> > > java.io.IOException: java.util.concurrent.ExecutionException: >> > > org.noggit.JSONParser$ParseException: JSON Parse Error: >> char=<,position=0 >> > > BEFORE='<' AFTER='html> <head> <meta http-equiv="Content-' >> > > at >> > > >> > >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:332) >> > > at >> > > >> > >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:231) >> > > >> > > >> > > >> > > I tried to look into solr logs but after turning on debug mode I found >> > > following >> > > POST /solr/collection_shard20_replica1/stream HTTP/1.1 >> > > "HTTP/1.1 404 Not Found[\r][\n]" >> > > >> > > >> > > looks like Parallel stream is trying to access /stream on shard. can >> > > someone tell me how to enable stream handler? I have export handler >> > > enabled. I will look at latest solrconfig to see if I can turn that >> on. >> > > >> > > >> > > >> > > @Joel I am running sizing exercises already , I will run new one with >> > > solr5.5+ and docValues on id enabled. >> > > >> > > BTW Solr streaming has amazing response times thanks for making it so >> > > FAST!!! >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Mon, Apr 25, 2016 at 10:54 AM, Joel Bernstein <joels...@gmail.com> >> > > wrote: >> > > >> > >> Can you describe how you're planning on using Streaming? I can >> provide >> > >> some >> > >> feedback on how it will perform for your use use. >> > >> >> > >> When scaling out Streaming you'll get large performance boosts when >> you >> > >> increase the number of shards, replicas and workers. This is >> > particularly >> > >> true if you're doing parallel relational algebra or map/reduce >> > operations. >> > >> >> > >> As far a DocValues being expensive with unique fields, you'll want to >> > do a >> > >> sizing exercise to see how many documents per-shard work best for >> your >> > use >> > >> case. There are different docValues implementations that will allow >> you >> > to >> > >> trade off memory for performance. >> > >> >> > >> Joel Bernstein >> > >> http://joelsolr.blogspot.com/ >> > >> >> > >> On Mon, Apr 25, 2016 at 3:30 AM, Reth RM <reth.ik...@gmail.com> >> wrote: >> > >> >> > >> > Hi, >> > >> > >> > >> > So, is the concern related to same field value being stored twice: >> > with >> > >> > stored=true and docValues=true? If that is the case, there is a >> jira >> > >> > relevant to this, fixed[1]. If you upgrade to 5.5/6.0 version, it >> is >> > >> > possible to read non-stored fields from docValues index., check >> out. >> > >> > >> > >> > >> > >> > [1] https://issues.apache.org/jira/browse/SOLR-8220 >> > >> > >> > >> > On Mon, Apr 25, 2016 at 9:44 AM, sudsport s <sudssf2...@gmail.com> >> > >> wrote: >> > >> > >> > >> > > Thanks Erik for reply, >> > >> > > >> > >> > > Since I was storing Id (its stored field) and after enabling >> > >> docValues my >> > >> > > guess is it will be stored in 2 places. also as per my >> understanding >> > >> > > docValues are great when you have values which repeat. I am not >> sure >> > >> how >> > >> > > beneficial it would be for uniqueId field. >> > >> > > I am looking at collection of few hundred billion documents , >> that >> > is >> > >> > > reason I really want to care about expense from design phase. >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > On Sun, Apr 24, 2016 at 7:24 PM, Erick Erickson < >> > >> erickerick...@gmail.com >> > >> > > >> > >> > > wrote: >> > >> > > >> > >> > > > In a word, "yes". >> > >> > > > >> > >> > > > DocValues aren't particularly expensive, or expensive at all. >> The >> > >> idea >> > >> > > > is that when you sort by a field or facet, the field has to be >> > >> > > > "uninverted" which builds the entire structure in Java's JVM >> (this >> > >> is >> > >> > > > when the field is _not_ DocValues). >> > >> > > > >> > >> > > > DocValues essentially serialize this structure to disk. So your >> > >> > > > on-disk index size is larger, but that size is MMaped rather >> than >> > >> > > > stored on Java's heap. >> > >> > > > >> > >> > > > Really, the question I'd have to ask though is "why do you care >> > >> about >> > >> > > > the expense?". If you have a functional requirement that has >> to be >> > >> > > > served by returning the id via the /export handler, you really >> > have >> > >> no >> > >> > > > choice. >> > >> > > > >> > >> > > > Best, >> > >> > > > Erick >> > >> > > > >> > >> > > > >> > >> > > > On Sun, Apr 24, 2016 at 9:55 AM, sudsport s < >> sudssf2...@gmail.com >> > > >> > >> > > wrote: >> > >> > > > > I was trying to use Streaming for reading basic tuple >> stream. I >> > am >> > >> > > using >> > >> > > > > sort by id asc , >> > >> > > > > I am getting following exception >> > >> > > > > >> > >> > > > > I am using export search handler as per >> > >> > > > > >> > >> > >> > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets >> > >> > > > > >> > >> > > > > null:java.io.IOException: id must have DocValues to use this >> > >> feature. >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.apache.solr.response.SortingResponseWriter.getFieldWriters(SortingResponseWriter.java:241) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.apache.solr.response.SortingResponseWriter.write(SortingResponseWriter.java:120) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:53) >> > >> > > > > at >> > >> > > > >> > >> > >> > >> >> > >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:742) >> > >> > > > > at >> > >> > > > >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:471) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) >> > >> > > > > at >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) >> > >> > > > > at >> > >> > > > >> > >> > >> > >> >> > >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) >> > >> > > > > at >> > >> org.eclipse.jetty.server.session.SessionHandler.doScope( >> > >> > > > > >> > >> > > > > >> > >> > > > > does it make sense to enable docValues for unique field? How >> > >> > expensive >> > >> > > > is it? >> > >> > > > > >> > >> > > > > >> > >> > > > > if I have existing collection can I update schema and >> optimize >> > >> > > > > collection to get docvalues enabled for id? >> > >> > > > > >> > >> > > > > >> > >> > > > > -- >> > >> > > > > >> > >> > > > > Thanks >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > > >> > >> > >