Geraint:

Good Catch! I totally missed that. So all of our focus on schema.xml has
been... totally irrelevant. Now that you pointed that out, there's also the
addition: add-unknown-fields-to-the-schema, which indicates you started
this up in "schemaless" mode.

In short, solr is trying to guess what your field types should be and
guessing wrong (again and again and again). This is the classic weakness of
schemaless. It's great for indexing stuff fast, but if it guesses wrong
you're stuck.


So to the original problem: I'd start over and either
1> use the regular setup, not schemaless
or
2> use the _managed_ schema API to explicitly add fields and fieldTypes to
the managed schema

Best,
Erick

On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH <
geraint.d...@syngenta.com> wrote:

> Okay, so maybe I'm missing something here (I'm still relatively new to
> Solr myself), but am I right in thinking the following is still in your
> solrconfig.xml file:
>
>   <schemaFactory class="ManagedIndexSchemaFactory">
>     <bool name="mutable">true</bool>
>     <str name="managedSchemaResourceName">managed-schema</str>
>   </schemaFactory>
>
> If so, wouldn't using a managed schema make several of your field
> definitions inside the schema.xml file semi-redundant?
>
> Regards,
> Geraint
>
>
> Geraint Duck
> Data Scientist
> Toxicology and Health Sciences
> Syngenta UK
> Email: geraint.d...@syngenta.com
>
>
> -----Original Message-----
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: 24 September 2015 09:23
> To: solr-user@lucene.apache.org
> Subject: Re: query parsing
>
> I would focus on this :
>
> "
>
> > 5> now kick off the DIH job and look again.
> >
> Now it shows a histogram, but most of the "terms" are long -- the full
> texts of (the table.column) eventlogtext.logtext, including the whitespace
> (with %0A used for newline characters)...  So, it appears it is not being
> tokenized properly, correct?"
> Can you open from your Solr ui , the schema xml and show us the snippets
> for that field that seems to not tokenise ?
> Can you show us ( even a screenshot is fine) the schema browser page
> related ?
> Could be a problem of encoding ?
> Following Erick details about the analysis, what are your results ?
>
> Cheers
>
> 2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>:
>
> > typically, the index dir is inside the data dir. Delete the index dir
> > and you should be good. If there is a tlog next to it, you might want
> > to delete that also.
> >
> > If you dont have a data dir, i wonder whether you set the data dir
> > when creating your core or collection. Typically the instance dir and
> > data dir aren't needed.
> >
> > Upayavira
> >
> > On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> > > OK, this is bizarre. You'd have had to set up SolrCloud by
> > > specifying the -zkRun command when you start Solr or the -zkHost;
> > > highly unlikely. On the admin page there would be a "cloud" link on
> > > the left side, I really doubt one's there.
> > >
> > > You should have a data directory, it should be the parent of the
> > > index and tlog directories. As of sanity check try looking at the
> > > analysis page.
> > > Type
> > > a bunch of words in the left hand side indexing box and uncheck the
> > > verbose box. As you can tell I'm grasping at straws. I'm still
> > > puzzled why you don't have a "data" directory here, but that
> > > shouldn't really matter. How did you create this index? I don't mean
> > > data import handler more how did you create the core that you're
> > > indexing to?
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers
> > > <mark.fenb...@noaa.gov>
> > > wrote:
> > >
> > > > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> > > >
> > > >> Then my next guess is you're not pointing at the index you think
> > > >> you
> > are
> > > >> when you 'rm -rf data'
> > > >>
> > > >> Just ignore the Elall field for now I should think, although get
> > > >> rid
> > of it
> > > >> if you don't think you need it.
> > > >>
> > > >> DIH should be irrelevant here.
> > > >>
> > > >> So let's back up.
> > > >> 1> go ahead and "rm -fr data" (with Solr stopped).
> > > >>
> > > > I have no "data" dir.  Did you mean "index" dir?  I removed 3
> > > > index directories (2 for spelling):
> > > > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> > > >
> > > >> 2> start Solr
> > > >> 3> do NOT re-index.
> > > >> 4> look at your index via the schema-browser. Of course there
> > > >> 4> should
> > be
> > > >> nothing there!
> > > >>
> > > > Correct!  It said "there is no term info :("
> > > >
> > > >> 5> now kick off the DIH job and look again.
> > > >>
> > > > Now it shows a histogram, but most of the "terms" are long -- the
> > > > full texts of (the table.column) eventlogtext.logtext, including
> > > > the
> > whitespace
> > > > (with %0A used for newline characters)...  So, it appears it is
> > > > not
> > being
> > > > tokenized properly, correct?
> > > >
> > > >> Your logtext field should have only single tokens. The fact that
> > > >> you
> > have
> > > >> some very
> > > >> long tokens presumably with whitespace) indicates that you aren't
> > really
> > > >> blowing
> > > >> the index away between indexing.
> > > >>
> > > > Well, I did this time for sure.  I verified that initially,
> > > > because it showed there was no term info until I DIH'd again.
> > > >
> > > >> Are you perhaps in Solr Cloud with more than one replica?
> > > >>
> > > > Not that I know of, but being new to Solr, there could be things
> > > > going
> > on
> > > > that I'm not aware of.  How can I tell?  I certainly didn't set
> > anything up
> > > > for solrCloud deliberately.
> > > >
> > > >> In that case you
> > > >> might be getting the index replicated on startup assuming you
> > > >> didn't blow away all replicas. If you are in SolrCloud, I'd just
> > > >> delete the collection and start over, after insuring that you'd
> > > >> pushed the configset up to Zookeeper.
> > > >>
> > > >> BTW, I always look at the schema.xml file from the Solr admin
> > > >> window
> > just
> > > >> as
> > > >> a sanity check in these situations.
> > > >>
> > > > Good idea!  But the one shown in the browser is identical to the
> > > > one
> > I've
> > > > been editing!  So that's not an issue.
> > > >
> > > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
> ________________________________
>
>
> Syngenta Limited, Registered in England No 2710846;Registered Office :
> Syngenta Limited, European Regional Centre, Priestley Road, Surrey Research
> Park, Guildford, Surrey, GU2 7YH, United Kingdom
> ________________________________
>  This message may contain confidential information. If you are not the
> designated recipient, please notify the sender immediately, and delete the
> original and any copies. Any use of the message by you is prohibited.
>

Reply via email to