happy to read that, regarding the spellcheck, is a different thing, so let us know for further details !
Cheers 2015-09-27 18:59 GMT+01:00 Mark Fenbers <mark.fenb...@noaa.gov>: > I am delighted to announce that I have it all working again! Well, not > all, just the searching! > > I deleted my core and created a new one from the command-line (solr > create_core -c EventLog2) using the basic_configs option. Then I had to add > my columns to the schema.xml and the dataimport handler to solrconfig.xml > and tweak a couple of other details. But to make a long story short, > parsing is working and I can search on terms without wrapping asterisks!! > Yay! Thanks for the help! > > Spell-checking still isn't working, though, and I'm apprehensive about > working with it today. But I will eventually. The complaint is it can't > find ELspell, which I had defined in the old setup that I blew away, so > I'll have to redefine it at some point! For now, I'm just gonna delight in > having searching working again! > > Mark > > > On 9/26/2015 11:05 PM, Erick Erickson wrote: > >> No need to re-install Solr, just create a new core, this time it'd >> probably be >> easiest to use the bin/solr create_core command. In the Solr >> directory just type bin/solr create_core -help to see the options. >> >> We're pretty much trying to migrate to using bin/solr for all the >> maintenance >> we can, but as always the documentation lags the code. >> >> Yeah, things are a bit ragged. The admin UI/core UI is really a legacy >> bit of code that has _always_ been confusing, I'm hoping we can pretty >> much remove it at some point since it's as trappy as it is. >> >> Best, >> Erick >> >> On Sat, Sep 26, 2015 at 12:49 PM, Mark Fenbers <mark.fenb...@noaa.gov> >> wrote: >> >>> OK, a lot of dialog while I was gone for two days! I read the whole >>> thread, >>> but I'm a newbie to Solr, so some of the dialog was Greek to me. I >>> understand the words, of course, but applying it so I know exactly what >>> to >>> do without screwing something else up is the problem. After all, that is >>> how I got into the mess in the first place. I'm glad I have good help to >>> untangle the knots I've made! >>> >>> I'd like to start over (option 1 below), but does this mean delete all my >>> config and reinstalling Solr?? Maybe that is not a bad idea, but I will >>> at >>> least save off my data-config.xml as that is clearly the one thing that >>> is >>> probably working right. However, I did do quite a bit of editing that I >>> would have to do again. Please advise... >>> >>> To be fair, I must answer Erick's question of how I created the data >>> index >>> in the first place, because this might be relevant... >>> >>> The bulk of the data is read from 9000+ text files, where each file was >>> manually typed. Before inserting into the database, I do a little bit of >>> processing of the text using "sed" to delete the top few and bottom few >>> lines, and to substitute each single-quote character with a pair of >>> single-quotes (so PostgreSQL doesn't choke). Line-feed characters are >>> preserved as ASCII 10 (hex 0A), but there shouldn't be (and I am not >>> aware >>> of) any characters aside from what is on the keyboard. >>> >>> Next, I insert it with this command: >>> psql -U awips -d OHRFC -c "INSERT INTO EventLogText VALUES('$postDate', >>> '$user', '$postDate', '$entryText', '$postCatVal');" >>> >>> In case you are wondering about my table, it is defined in this way: >>> CREATE TABLE eventlogtext ( >>> posttime timestamp without time zone NOT NULL, -- Timestamp of this >>> entry's original posting >>> username character varying(8), -- username (logname) of the original >>> poster >>> lastmodtime timestamp without time zone, -- Last time record was >>> altered >>> logtext text, -- text of the log entry >>> category integer, -- bit-wise category value >>> CONSTRAINT eventlogtext_pkey PRIMARY KEY (posttime) >>> ) >>> >>> To do the indexing, I merely use /dataimport?full-import, but it knows >>> what >>> to do from my data-config.xml; which is here: >>> >>> <dataConfig> >>> <dataSource driver="org.postgresql.Driver" >>> url="jdbc:postgresql://dx1f/OHRFC" user="awips" /> >>> <document> >>> <entity name="eventlogtext" query="SELECT posttime AS id, >>> username, >>> logtext, category FROM eventlogtext;" >>> deltaQuery="SELECT posttime AS id FROM eventlogtext >>> WHERE >>> lastmodtime > '${dataimporter.last_index_time}';"> >>> <entity name="categorytypes" query="SELECT catname FROM >>> categorytypes WHERE catid='${eventlogtext.category}';"> >>> </entity> >>> </entity> >>> </document> >>> </dataConfig> >>> >>> Hope this helps! >>> >>> Thanks, >>> Mark >>> >>> On 9/24/2015 10:57 AM, Erick Erickson wrote: >>> >>>> Geraint: >>>> >>>> Good Catch! I totally missed that. So all of our focus on schema.xml has >>>> been... totally irrelevant. Now that you pointed that out, there's also >>>> the >>>> addition: add-unknown-fields-to-the-schema, which indicates you started >>>> this up in "schemaless" mode. >>>> >>>> In short, solr is trying to guess what your field types should be and >>>> guessing wrong (again and again and again). This is the classic weakness >>>> of >>>> schemaless. It's great for indexing stuff fast, but if it guesses wrong >>>> you're stuck. >>>> >>>> >>>> So to the original problem: I'd start over and either >>>> 1> use the regular setup, not schemaless >>>> or >>>> 2> use the _managed_ schema API to explicitly add fields and fieldTypes >>>> to >>>> the managed schema >>>> >>>> Best, >>>> Erick >>>> >>>> On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH < >>>> geraint.d...@syngenta.com> wrote: >>>> >>>> Okay, so maybe I'm missing something here (I'm still relatively new to >>>>> Solr myself), but am I right in thinking the following is still in your >>>>> solrconfig.xml file: >>>>> >>>>> <schemaFactory class="ManagedIndexSchemaFactory"> >>>>> <bool name="mutable">true</bool> >>>>> <str name="managedSchemaResourceName">managed-schema</str> >>>>> </schemaFactory> >>>>> >>>>> If so, wouldn't using a managed schema make several of your field >>>>> definitions inside the schema.xml file semi-redundant? >>>>> >>>>> Regards, >>>>> Geraint >>>>> >>>>> >>>>> Geraint Duck >>>>> Data Scientist >>>>> Toxicology and Health Sciences >>>>> Syngenta UK >>>>> Email: geraint.d...@syngenta.com >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] >>>>> Sent: 24 September 2015 09:23 >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Re: query parsing >>>>> >>>>> I would focus on this : >>>>> >>>>> " >>>>> >>>>> 5> now kick off the DIH job and look again. >>>>>> >>>>>> Now it shows a histogram, but most of the "terms" are long -- the full >>>>> texts of (the table.column) eventlogtext.logtext, including the >>>>> whitespace >>>>> (with %0A used for newline characters)... So, it appears it is not >>>>> being >>>>> tokenized properly, correct?" >>>>> Can you open from your Solr ui , the schema xml and show us the >>>>> snippets >>>>> for that field that seems to not tokenise ? >>>>> Can you show us ( even a screenshot is fine) the schema browser page >>>>> related ? >>>>> Could be a problem of encoding ? >>>>> Following Erick details about the analysis, what are your results ? >>>>> >>>>> Cheers >>>>> >>>>> 2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>: >>>>> >>>>> typically, the index dir is inside the data dir. Delete the index dir >>>>>> and you should be good. If there is a tlog next to it, you might want >>>>>> to delete that also. >>>>>> >>>>>> If you dont have a data dir, i wonder whether you set the data dir >>>>>> when creating your core or collection. Typically the instance dir and >>>>>> data dir aren't needed. >>>>>> >>>>>> Upayavira >>>>>> >>>>>> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote: >>>>>> >>>>>>> OK, this is bizarre. You'd have had to set up SolrCloud by >>>>>>> specifying the -zkRun command when you start Solr or the -zkHost; >>>>>>> highly unlikely. On the admin page there would be a "cloud" link on >>>>>>> the left side, I really doubt one's there. >>>>>>> >>>>>>> You should have a data directory, it should be the parent of the >>>>>>> index and tlog directories. As of sanity check try looking at the >>>>>>> analysis page. >>>>>>> Type >>>>>>> a bunch of words in the left hand side indexing box and uncheck the >>>>>>> verbose box. As you can tell I'm grasping at straws. I'm still >>>>>>> puzzled why you don't have a "data" directory here, but that >>>>>>> shouldn't really matter. How did you create this index? I don't mean >>>>>>> data import handler more how did you create the core that you're >>>>>>> indexing to? >>>>>>> >>>>>>> Best, >>>>>>> Erick >>>>>>> >>>>>>> On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers >>>>>>> <mark.fenb...@noaa.gov> >>>>>>> wrote: >>>>>>> >>>>>>> On 9/23/2015 12:30 PM, Erick Erickson wrote: >>>>>>>> >>>>>>>> Then my next guess is you're not pointing at the index you think >>>>>>>>> you >>>>>>>>> >>>>>>>> are >>>>>> >>>>>>> when you 'rm -rf data' >>>>>>>>> >>>>>>>>> Just ignore the Elall field for now I should think, although get >>>>>>>>> rid >>>>>>>>> >>>>>>>> of it >>>>>> >>>>>>> if you don't think you need it. >>>>>>>>> >>>>>>>>> DIH should be irrelevant here. >>>>>>>>> >>>>>>>>> So let's back up. >>>>>>>>> 1> go ahead and "rm -fr data" (with Solr stopped). >>>>>>>>> >>>>>>>>> I have no "data" dir. Did you mean "index" dir? I removed 3 >>>>>>>> index directories (2 for spelling): >>>>>>>> cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex >>>>>>>> >>>>>>>> 2> start Solr >>>>>>>>> 3> do NOT re-index. >>>>>>>>> 4> look at your index via the schema-browser. Of course there >>>>>>>>> 4> should >>>>>>>>> >>>>>>>> be >>>>>> >>>>>>> nothing there! >>>>>>>>> >>>>>>>>> Correct! It said "there is no term info :(" >>>>>>>> >>>>>>>> 5> now kick off the DIH job and look again. >>>>>>>>> >>>>>>>>> Now it shows a histogram, but most of the "terms" are long -- the >>>>>>>> full texts of (the table.column) eventlogtext.logtext, including >>>>>>>> the >>>>>>>> >>>>>>> whitespace >>>>>> >>>>>>> (with %0A used for newline characters)... So, it appears it is >>>>>>>> not >>>>>>>> >>>>>>> being >>>>>> >>>>>>> tokenized properly, correct? >>>>>>>> >>>>>>>> Your logtext field should have only single tokens. The fact that >>>>>>>>> you >>>>>>>>> >>>>>>>> have >>>>>> >>>>>>> some very >>>>>>>>> long tokens presumably with whitespace) indicates that you aren't >>>>>>>>> >>>>>>>> really >>>>>> >>>>>>> blowing >>>>>>>>> the index away between indexing. >>>>>>>>> >>>>>>>>> Well, I did this time for sure. I verified that initially, >>>>>>>> because it showed there was no term info until I DIH'd again. >>>>>>>> >>>>>>>> Are you perhaps in Solr Cloud with more than one replica? >>>>>>>>> >>>>>>>>> Not that I know of, but being new to Solr, there could be things >>>>>>>> going >>>>>>>> >>>>>>> on >>>>>> >>>>>>> that I'm not aware of. How can I tell? I certainly didn't set >>>>>>>> >>>>>>> anything up >>>>>> >>>>>>> for solrCloud deliberately. >>>>>>>> >>>>>>>> In that case you >>>>>>>>> might be getting the index replicated on startup assuming you >>>>>>>>> didn't blow away all replicas. If you are in SolrCloud, I'd just >>>>>>>>> delete the collection and start over, after insuring that you'd >>>>>>>>> pushed the configset up to Zookeeper. >>>>>>>>> >>>>>>>>> BTW, I always look at the schema.xml file from the Solr admin >>>>>>>>> window >>>>>>>>> >>>>>>>> just >>>>>> >>>>>>> as >>>>>>>>> a sanity check in these situations. >>>>>>>>> >>>>>>>>> Good idea! But the one shown in the browser is identical to the >>>>>>>> one >>>>>>>> >>>>>>> I've >>>>>> >>>>>>> been editing! So that's not an issue. >>>>>>>> >>>>>>>> >>>>>>>> >>>>> -- >>>>> -------------------------- >>>>> >>>>> Benedetti Alessandro >>>>> Visiting card - http://about.me/alessandro_benedetti >>>>> Blog - http://alexbenedetti.blogspot.co.uk >>>>> >>>>> "Tyger, tyger burning bright >>>>> In the forests of the night, >>>>> What immortal hand or eye >>>>> Could frame thy fearful symmetry?" >>>>> >>>>> William Blake - Songs of Experience -1794 England >>>>> ________________________________ >>>>> >>>>> >>>>> Syngenta Limited, Registered in England No 2710846;Registered Office : >>>>> Syngenta Limited, European Regional Centre, Priestley Road, Surrey >>>>> Research >>>>> Park, Guildford, Surrey, GU2 7YH, United Kingdom >>>>> ________________________________ >>>>> This message may contain confidential information. If you are not >>>>> the >>>>> designated recipient, please notify the sender immediately, and delete >>>>> the >>>>> original and any copies. Any use of the message by you is prohibited. >>>>> >>>>> > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England