Re: query parsing

Alessandro Benedetti Mon, 28 Sep 2015 02:00:03 -0700

happy to read that, regarding the spellcheck, is a different thing, so let
us know for further details !


Cheers

2015-09-27 18:59 GMT+01:00 Mark Fenbers <mark.fenb...@noaa.gov>:

> I am delighted to announce that I have it all working again!  Well, not
> all, just the searching!
>
> I deleted my core and created a new one from the command-line (solr
> create_core -c EventLog2) using the basic_configs option. Then I had to add
> my columns to the schema.xml and the dataimport handler to solrconfig.xml
> and tweak a couple of other details. But to make a long story short,
> parsing is working and I can search on terms without wrapping asterisks!!
> Yay!  Thanks for the help!
>
> Spell-checking still isn't working, though, and I'm apprehensive about
> working with it today.  But I will eventually.  The complaint is it can't
> find ELspell, which I had defined in the old setup that I blew away, so
> I'll have to redefine it at some point!  For now, I'm just gonna delight in
> having searching working again!
>
> Mark
>
>
> On 9/26/2015 11:05 PM, Erick Erickson wrote:
>
>> No need to re-install Solr, just create a new core, this time it'd
>> probably be
>> easiest to use the bin/solr create_core command. In the Solr
>> directory just type bin/solr create_core -help to see the options.
>>
>> We're pretty much trying to migrate to using bin/solr for all the
>> maintenance
>> we can, but as always the documentation lags the code.
>>
>> Yeah, things are a bit ragged. The admin UI/core UI is really a legacy
>> bit of code that has _always_ been confusing, I'm hoping we can pretty
>> much remove it at some point since it's as trappy as it is.
>>
>> Best,
>> Erick
>>
>> On Sat, Sep 26, 2015 at 12:49 PM, Mark Fenbers <mark.fenb...@noaa.gov>
>> wrote:
>>
>>> OK, a lot of dialog while I was gone for two days!  I read the whole
>>> thread,
>>> but I'm a newbie to Solr, so some of the dialog was Greek to me.  I
>>> understand the words, of course, but applying it so I know exactly what
>>> to
>>> do without screwing something else up is the problem.  After all, that is
>>> how I got into the mess in the first place.  I'm glad I have good help to
>>> untangle the knots I've made!
>>>
>>> I'd like to start over (option 1 below), but does this mean delete all my
>>> config and reinstalling Solr??  Maybe that is not a bad idea, but I will
>>> at
>>> least save off my data-config.xml as that is clearly the one thing that
>>> is
>>> probably working right.  However, I did do quite a bit of editing that I
>>> would have to do again. Please advise...
>>>
>>> To be fair, I must answer Erick's question of how I created the data
>>> index
>>> in the first place, because this might be relevant...
>>>
>>> The bulk of the data is read from 9000+ text files, where each file was
>>> manually typed.  Before inserting into the database, I do a little bit of
>>> processing of the text using "sed" to delete the top few and bottom few
>>> lines, and to substitute each single-quote character with a pair of
>>> single-quotes (so PostgreSQL doesn't choke).  Line-feed characters are
>>> preserved as ASCII 10 (hex 0A), but there shouldn't be (and I am not
>>> aware
>>> of) any characters aside from what is on the keyboard.
>>>
>>> Next, I insert it with this command:
>>> psql -U awips -d OHRFC -c "INSERT INTO EventLogText VALUES('$postDate',
>>> '$user', '$postDate', '$entryText', '$postCatVal');"
>>>
>>> In case you are wondering about my table, it is defined in this way:
>>> CREATE TABLE eventlogtext (
>>>    posttime timestamp without time zone NOT NULL, -- Timestamp of this
>>> entry's original posting
>>>    username character varying(8), -- username (logname) of the original
>>> poster
>>>    lastmodtime timestamp without time zone, -- Last time record was
>>> altered
>>>    logtext text, -- text of the log entry
>>>    category integer, -- bit-wise category value
>>>    CONSTRAINT eventlogtext_pkey PRIMARY KEY (posttime)
>>> )
>>>
>>> To do the indexing, I merely use /dataimport?full-import, but it knows
>>> what
>>> to do from my data-config.xml; which is here:
>>>
>>> <dataConfig>
>>>      <dataSource driver="org.postgresql.Driver"
>>> url="jdbc:postgresql://dx1f/OHRFC" user="awips" />
>>>      <document>
>>>          <entity name="eventlogtext" query="SELECT posttime AS id,
>>> username,
>>> logtext, category FROM eventlogtext;"
>>>                  deltaQuery="SELECT posttime AS id FROM eventlogtext
>>> WHERE
>>> lastmodtime > '${dataimporter.last_index_time}';">
>>>              <entity name="categorytypes" query="SELECT catname FROM
>>> categorytypes WHERE catid='${eventlogtext.category}';">
>>>              </entity>
>>>          </entity>
>>>      </document>
>>> </dataConfig>
>>>
>>> Hope this helps!
>>>
>>> Thanks,
>>> Mark
>>>
>>> On 9/24/2015 10:57 AM, Erick Erickson wrote:
>>>
>>>> Geraint:
>>>>
>>>> Good Catch! I totally missed that. So all of our focus on schema.xml has
>>>> been... totally irrelevant. Now that you pointed that out, there's also
>>>> the
>>>> addition: add-unknown-fields-to-the-schema, which indicates you started
>>>> this up in "schemaless" mode.
>>>>
>>>> In short, solr is trying to guess what your field types should be and
>>>> guessing wrong (again and again and again). This is the classic weakness
>>>> of
>>>> schemaless. It's great for indexing stuff fast, but if it guesses wrong
>>>> you're stuck.
>>>>
>>>>
>>>> So to the original problem: I'd start over and either
>>>> 1> use the regular setup, not schemaless
>>>> or
>>>> 2> use the _managed_ schema API to explicitly add fields and fieldTypes
>>>> to
>>>> the managed schema
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH <
>>>> geraint.d...@syngenta.com> wrote:
>>>>
>>>> Okay, so maybe I'm missing something here (I'm still relatively new to
>>>>> Solr myself), but am I right in thinking the following is still in your
>>>>> solrconfig.xml file:
>>>>>
>>>>>     <schemaFactory class="ManagedIndexSchemaFactory">
>>>>>       <bool name="mutable">true</bool>
>>>>>       <str name="managedSchemaResourceName">managed-schema</str>
>>>>>     </schemaFactory>
>>>>>
>>>>> If so, wouldn't using a managed schema make several of your field
>>>>> definitions inside the schema.xml file semi-redundant?
>>>>>
>>>>> Regards,
>>>>> Geraint
>>>>>
>>>>>
>>>>> Geraint Duck
>>>>> Data Scientist
>>>>> Toxicology and Health Sciences
>>>>> Syngenta UK
>>>>> Email: geraint.d...@syngenta.com
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
>>>>> Sent: 24 September 2015 09:23
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: query parsing
>>>>>
>>>>> I would focus on this :
>>>>>
>>>>> "
>>>>>
>>>>> 5> now kick off the DIH job and look again.
>>>>>>
>>>>>> Now it shows a histogram, but most of the "terms" are long -- the full
>>>>> texts of (the table.column) eventlogtext.logtext, including the
>>>>> whitespace
>>>>> (with %0A used for newline characters)...  So, it appears it is not
>>>>> being
>>>>> tokenized properly, correct?"
>>>>> Can you open from your Solr ui , the schema xml and show us the
>>>>> snippets
>>>>> for that field that seems to not tokenise ?
>>>>> Can you show us ( even a screenshot is fine) the schema browser page
>>>>> related ?
>>>>> Could be a problem of encoding ?
>>>>> Following Erick details about the analysis, what are your results ?
>>>>>
>>>>> Cheers
>>>>>
>>>>> 2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>:
>>>>>
>>>>> typically, the index dir is inside the data dir. Delete the index dir
>>>>>> and you should be good. If there is a tlog next to it, you might want
>>>>>> to delete that also.
>>>>>>
>>>>>> If you dont have a data dir, i wonder whether you set the data dir
>>>>>> when creating your core or collection. Typically the instance dir and
>>>>>> data dir aren't needed.
>>>>>>
>>>>>> Upayavira
>>>>>>
>>>>>> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
>>>>>>
>>>>>>> OK, this is bizarre. You'd have had to set up SolrCloud by
>>>>>>> specifying the -zkRun command when you start Solr or the -zkHost;
>>>>>>> highly unlikely. On the admin page there would be a "cloud" link on
>>>>>>> the left side, I really doubt one's there.
>>>>>>>
>>>>>>> You should have a data directory, it should be the parent of the
>>>>>>> index and tlog directories. As of sanity check try looking at the
>>>>>>> analysis page.
>>>>>>> Type
>>>>>>> a bunch of words in the left hand side indexing box and uncheck the
>>>>>>> verbose box. As you can tell I'm grasping at straws. I'm still
>>>>>>> puzzled why you don't have a "data" directory here, but that
>>>>>>> shouldn't really matter. How did you create this index? I don't mean
>>>>>>> data import handler more how did you create the core that you're
>>>>>>> indexing to?
>>>>>>>
>>>>>>> Best,
>>>>>>> Erick
>>>>>>>
>>>>>>> On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers
>>>>>>> <mark.fenb...@noaa.gov>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On 9/23/2015 12:30 PM, Erick Erickson wrote:
>>>>>>>>
>>>>>>>> Then my next guess is you're not pointing at the index you think
>>>>>>>>> you
>>>>>>>>>
>>>>>>>> are
>>>>>>
>>>>>>> when you 'rm -rf data'
>>>>>>>>>
>>>>>>>>> Just ignore the Elall field for now I should think, although get
>>>>>>>>> rid
>>>>>>>>>
>>>>>>>> of it
>>>>>>
>>>>>>> if you don't think you need it.
>>>>>>>>>
>>>>>>>>> DIH should be irrelevant here.
>>>>>>>>>
>>>>>>>>> So let's back up.
>>>>>>>>> 1> go ahead and "rm -fr data" (with Solr stopped).
>>>>>>>>>
>>>>>>>>> I have no "data" dir.  Did you mean "index" dir?  I removed 3
>>>>>>>> index directories (2 for spelling):
>>>>>>>> cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
>>>>>>>>
>>>>>>>> 2> start Solr
>>>>>>>>> 3> do NOT re-index.
>>>>>>>>> 4> look at your index via the schema-browser. Of course there
>>>>>>>>> 4> should
>>>>>>>>>
>>>>>>>> be
>>>>>>
>>>>>>> nothing there!
>>>>>>>>>
>>>>>>>>> Correct!  It said "there is no term info :("
>>>>>>>>
>>>>>>>> 5> now kick off the DIH job and look again.
>>>>>>>>>
>>>>>>>>> Now it shows a histogram, but most of the "terms" are long -- the
>>>>>>>> full texts of (the table.column) eventlogtext.logtext, including
>>>>>>>> the
>>>>>>>>
>>>>>>> whitespace
>>>>>>
>>>>>>> (with %0A used for newline characters)...  So, it appears it is
>>>>>>>> not
>>>>>>>>
>>>>>>> being
>>>>>>
>>>>>>> tokenized properly, correct?
>>>>>>>>
>>>>>>>> Your logtext field should have only single tokens. The fact that
>>>>>>>>> you
>>>>>>>>>
>>>>>>>> have
>>>>>>
>>>>>>> some very
>>>>>>>>> long tokens presumably with whitespace) indicates that you aren't
>>>>>>>>>
>>>>>>>> really
>>>>>>
>>>>>>> blowing
>>>>>>>>> the index away between indexing.
>>>>>>>>>
>>>>>>>>> Well, I did this time for sure.  I verified that initially,
>>>>>>>> because it showed there was no term info until I DIH'd again.
>>>>>>>>
>>>>>>>> Are you perhaps in Solr Cloud with more than one replica?
>>>>>>>>>
>>>>>>>>> Not that I know of, but being new to Solr, there could be things
>>>>>>>> going
>>>>>>>>
>>>>>>> on
>>>>>>
>>>>>>> that I'm not aware of.  How can I tell?  I certainly didn't set
>>>>>>>>
>>>>>>> anything up
>>>>>>
>>>>>>> for solrCloud deliberately.
>>>>>>>>
>>>>>>>> In that case you
>>>>>>>>> might be getting the index replicated on startup assuming you
>>>>>>>>> didn't blow away all replicas. If you are in SolrCloud, I'd just
>>>>>>>>> delete the collection and start over, after insuring that you'd
>>>>>>>>> pushed the configset up to Zookeeper.
>>>>>>>>>
>>>>>>>>> BTW, I always look at the schema.xml file from the Solr admin
>>>>>>>>> window
>>>>>>>>>
>>>>>>>> just
>>>>>>
>>>>>>> as
>>>>>>>>> a sanity check in these situations.
>>>>>>>>>
>>>>>>>>> Good idea!  But the one shown in the browser is identical to the
>>>>>>>> one
>>>>>>>>
>>>>>>> I've
>>>>>>
>>>>>>> been editing!  So that's not an issue.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> --
>>>>> --------------------------
>>>>>
>>>>> Benedetti Alessandro
>>>>> Visiting card - http://about.me/alessandro_benedetti
>>>>> Blog - http://alexbenedetti.blogspot.co.uk
>>>>>
>>>>> "Tyger, tyger burning bright
>>>>> In the forests of the night,
>>>>> What immortal hand or eye
>>>>> Could frame thy fearful symmetry?"
>>>>>
>>>>> William Blake - Songs of Experience -1794 England
>>>>> ________________________________
>>>>>
>>>>>
>>>>> Syngenta Limited, Registered in England No 2710846;Registered Office :
>>>>> Syngenta Limited, European Regional Centre, Priestley Road, Surrey
>>>>> Research
>>>>> Park, Guildford, Surrey, GU2 7YH, United Kingdom
>>>>> ________________________________
>>>>>    This message may contain confidential information. If you are not
>>>>> the
>>>>> designated recipient, please notify the sender immediately, and delete
>>>>> the
>>>>> original and any copies. Any use of the message by you is prohibited.
>>>>>
>>>>>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: query parsing

Reply via email to