Re: Solr on HDFS: Streaming API performance tuning

Joel Bernstein Sun, 18 Dec 2016 12:57:41 -0800

Ok, based on the stack trace I suspect one of your sort fields has NULL
values, which in the 5x branch could produce null pointers if a segment had
no values for a sort field. This is also fixed in the Solr 6x branch.


Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, Dec 17, 2016 at 2:44 PM, Chetas Joshi <chetas.jo...@gmail.com>
wrote:

> Here is the stack trace.
>
> java.lang.NullPointerException
>
>         at
> org.apache.solr.client.solrj.io.comp.FieldComparator$2.
> compare(FieldComparator.java:85)
>
>         at
> org.apache.solr.client.solrj.io.comp.FieldComparator.
> compare(FieldComparator.java:92)
>
>         at
> org.apache.solr.client.solrj.io.comp.FieldComparator.
> compare(FieldComparator.java:30)
>
>         at
> org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:45)
>
>         at
> org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:33)
>
>         at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> TupleWrapper.compareTo(CloudSolrStream.java:396)
>
>         at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> TupleWrapper.compareTo(CloudSolrStream.java:381)
>
>         at java.util.TreeMap.put(TreeMap.java:560)
>
>         at java.util.TreeSet.add(TreeSet.java:255)
>
>         at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream._
> read(CloudSolrStream.java:366)
>
>         at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> read(CloudSolrStream.java:353)
>
>         at
>
> *.*.*.*.SolrStreamResultIterator$$anon$1.run(SolrStreamResultIterator.
> scala:101)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> 16/11/17 13:04:31 *ERROR* SolrStreamResultIterator:missing exponent
> number:
> char=A,position=106596
> BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA'
> AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj'
>
> org.noggit.JSONParser$ParseException: missing exponent number:
> char=A,position=106596
> BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA'
> AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj'
>
>         at org.noggit.JSONParser.err(JSONParser.java:356)
>
>         at org.noggit.JSONParser.readExp(JSONParser.java:513)
>
>         at org.noggit.JSONParser.readNumber(JSONParser.java:419)
>
>         at org.noggit.JSONParser.next(JSONParser.java:845)
>
>         at org.noggit.JSONParser.nextEvent(JSONParser.java:951)
>
>         at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:127)
>
>         at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57)
>
>         at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37)
>
>         at
> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> next(JSONTupleStream.java:84)
>
>         at
> org.apache.solr.client.solrj.io.stream.SolrStream.read(
> SolrStream.java:147)
>
>         at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.next(
> CloudSolrStream.java:413)
>
>         at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream._
> read(CloudSolrStream.java:365)
>
>         at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> read(CloudSolrStream.java:353)
>
>
> Thanks!
>
> On Fri, Dec 16, 2016 at 11:45 PM, Reth RM <reth.ik...@gmail.com> wrote:
>
> > If you could provide the json parse exception stack trace, it might help
> to
> > predict issue there.
> >
> >
> > On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi <chetas.jo...@gmail.com>
> > wrote:
> >
> > > Hi Joel,
> > >
> > > The only NON alpha-numeric characters I have in my data are '+' and
> '/'.
> > I
> > > don't have any backslashes.
> > >
> > > If the special characters was the issue, I should get the JSON parsing
> > > exceptions every time irrespective of the index size and irrespective
> of
> > > the available memory on the machine. That is not the case here. The
> > > streaming API successfully returns all the documents when the index
> size
> > is
> > > small and fits in the available memory. That's the reason I am
> confused.
> > >
> > > Thanks!
> > >
> > > On Fri, Dec 16, 2016 at 5:43 PM, Joel Bernstein <joels...@gmail.com>
> > > wrote:
> > >
> > > > The Streaming API may have been throwing exceptions because the JSON
> > > > special characters were not escaped. This was fixed in Solr 6.0.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi <
> chetas.jo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am running Solr 5.5.0.
> > > > > It is a solrCloud of 50 nodes and I have the following config for
> all
> > > the
> > > > > collections.
> > > > > maxShardsperNode: 1
> > > > > replicationFactor: 1
> > > > >
> > > > > I was using Streaming API to get back results from Solr. It worked
> > fine
> > > > for
> > > > > a while until the index data size reached beyond 40 GB per shard
> > (i.e.
> > > > per
> > > > > node). It started throwing JSON parsing exceptions while reading
> the
> > > > > TupleStream data. FYI: I have other services (Yarn, Spark) deployed
> > on
> > > > the
> > > > > same boxes on which Solr shards are running. Spark jobs also use a
> > lot
> > > of
> > > > > disk cache. So, the free available disk cache on the boxes vary a
> > > > > lot depending upon what else is running on the box.
> > > > >
> > > > > Due to this issue, I moved to using the cursor approach and it
> works
> > > fine
> > > > > but as we all know it is way slower than the streaming approach.
> > > > >
> > > > > Currently the index size per shard is 80GB (The machine has 512 GB
> of
> > > RAM
> > > > > and being used by different services/programs: heap/off-heap and
> the
> > > disk
> > > > > cache requirements).
> > > > >
> > > > > When I have enough RAM (more than 80 GB so that all the index data
> > > could
> > > > > fit in memory) available on the machine, the streaming API succeeds
> > > > without
> > > > > running into any exceptions.
> > > > >
> > > > > Question:
> > > > > How different the index data caching mechanism (for HDFS) is for
> the
> > > > > Streaming API from the cursorMark approach?
> > > > > Why cursor works every time but streaming works only when there is
> a
> > > lot
> > > > of
> > > > > free disk cache?
> > > > >
> > > > > Thank you.
> > > > >
> > > >
> > >
> >
>

Re: Solr on HDFS: Streaming API performance tuning

Reply via email to