Re: Synonym format not working
Actual synonym : ccc => 1,2 ccc=>3 The result when i added &dubugQuery=true is: - - 0 15 - 10 0 on ccc true 2.2 - ccc ccc MultiPhraseQuery(all:" (1 ) (2 ccc ) 3") all:" (1 ) (2 ccc ) 3" OldLuceneQParser - 8.0 - 2.0 - 1.0 - 0.0 - 0.0 - 0.0 - 0.0 - 4.0 - 2.0 - 0.0 - 0.0 - 0.0 - 2.0 Otis Gospodnetic wrote: > > I can't see the problem at the moment. What do you see when you use > &debugQuery=true in the URL? Do you see the query that includes synonyms? > Can you give us the actual query and actual synonyms? > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: prerna07 <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Friday, October 17, 2008 12:36:40 AM >> Subject: Synonym format not working >> >> >> Hi, >> >> I am facing issue in synonym search of solr. The synonym.txt contain the >> format: >> >> ccc => 1,2,ccc >> ccc => 3 >> >> I am not getting any search result for ccc. I have created indexes with >> string value. >> >> Do i need to change anything in schema .xml ? >> >> String tag from Schema.xml : >> >> omitNorms="true"> >> >> >> >> ignoreCase="true" expand="true"/> >> >> words="stopwords.txt"/> >> >> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >> >> >> protected="protwords.txt"/> >> >> >> >> >> Any pointers to solve the issue? >> >> Thanks, >> Prerna >> >> >> -- >> View this message in context: >> http://www.nabble.com/Synonym--format-not-working-tp20026988p20026988.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Synonym--format-not-working-tp20026988p20027720.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Synonym format not working
I can't see the problem at the moment. What do you see when you use &debugQuery=true in the URL? Do you see the query that includes synonyms? Can you give us the actual query and actual synonyms? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: prerna07 <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, October 17, 2008 12:36:40 AM > Subject: Synonym format not working > > > Hi, > > I am facing issue in synonym search of solr. The synonym.txt contain the > format: > > ccc => 1,2,ccc > ccc => 3 > > I am not getting any search result for ccc. I have created indexes with > string value. > > Do i need to change anything in schema .xml ? > > String tag from Schema.xml : > > omitNorms="true"> > > > > ignoreCase="true" expand="true"/> > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > > protected="protwords.txt"/> > > > > > Any pointers to solve the issue? > > Thanks, > Prerna > > > -- > View this message in context: > http://www.nabble.com/Synonym--format-not-working-tp20026988p20026988.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reduction of open files
Out of curiosity, how many files are held open when you hit the limit? What does ulimit show? And what does lsof show? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Paul deGrandis <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, October 16, 2008 3:28:29 PM > Subject: Reduction of open files > > I have been working with SOLR for a few months now. According to some > documentation I read, segment files only have one set of all the other > lingustic module type of stuff (normalization, frequency), is there a > way to remove/reduce the files not associated with a segment besides > optimizing the index? > > I set my mergeFactor to 2 for sake of trying to tease out a solution. > I have tried readercycle thinking it was just stale readers. That did > not work. > > If anyone has any experience or knows of any documentation that can > get me closer to achieving this, I would greatly appreciate it. > > Paul
Re: Different XML format for multi-valued fields?
The component that writes out the values do not know if it is multivalued or not. So if it finds only a single value it writes it out as such On Thu, Oct 16, 2008 at 10:52 PM, oleg_gnatovskiy <[EMAIL PROTECTED]> wrote: > > Hello. I have an index built in Solr with several multi-value fields. When > the multi-value field has only one value for a document, the XML returned > looks like this: > > 5693 > > However, when there are multiple values for the field, the XMl looks like > this: > arr name="someIds"> > 11199 > 1722 > > Is there a reason for this difference? Also, how does faceting work with > multi-valued fields? It seems that I sometimes get facet results from > multi-valued fields, and sometimes I don't. > > Thanks. > -- > View this message in context: > http://www.nabble.com/Different-XML-format-for-multi-valued-fields--tp20015951p20015951.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul
Re: RegexTransformer debugging (DIH)
If it is a normal exception it is logged with the number of document where it failed and you can put it on debugger with start=&rows=1 We do not catch a throwable or Error so it gets slipped through. if you are adventurous enough wrap the RegexTranformer with your own and apply that say transformer="my.ReegexWrapper" and catch a throwable and print out the row. On Thu, Oct 16, 2008 at 9:49 PM, Jon Baer <[EMAIL PROTECTED]> wrote: > Is there a way to prevent this from occurring (or a way to nail down the doc > which is causing it?): > > INFO: [news] webapp=/solr path=/admin/dataimport params={command=status} > status=0 QTime=0 > Exception in thread "Thread-14" java.lang.StackOverflowError >at java.util.regex.Pattern$Single.match(Pattern.java:3313) >at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4763) >at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) >at java.util.regex.Pattern$All.match(Pattern.java:4079) >at java.util.regex.Pattern$Branch.match(Pattern.java:4538) >at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578) >at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767) >at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) >at java.util.regex.Pattern$All.match(Pattern.java:4079) >at java.util.regex.Pattern$Branch.match(Pattern.java:4538) >at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578) >at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767) >at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) >at java.util.regex.Pattern$All.match(Pattern.java:4079) > > Thanks. > > - Jon > > -- --Noble Paul
Re: dataimport, both splitBy and dateTimeFormat
Thanks David, I have updated the wiki documentation http://wiki.apache.org/solr/DataImportHandler#transformer The default transformers do not have any special privilege it is like any normal user provided transformer.We just identified some commonly found usecases and added transformers for that. Applying a transformer is not very 'cheap' it has to do extra checks to know whether to apply or not. On Fri, Oct 17, 2008 at 12:26 AM, David Smiley @MITRE.org <[EMAIL PROTECTED]> wrote: > > The wiki didn't mention I can specify multiple transformers. BTW, it's > "transformer" (singular), not "transformers". I did mean both NFT and DFT > because I was speaking of the general case, not just mine in particular. I > thought that the built-in transformers were always in-effect and so I > expected NFT,DFT to occur last. Sorry if I wasn't clear. > > Thanks for your help; it worked. > > ~ David > > > Shalin Shekhar Mangar wrote: >> >> Hi David, >> >> I think you meant RegexTransformer instead of NumberFormatTransformer. >> Anyhow, the order in which the transformers are applied is the same as the >> order in which you specify them. >> >> So make sure your entity has >> transformers="RegexTransformer,DateFormatTransformer". >> >> On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org >> <[EMAIL PROTECTED]>wrote: >> >>> >>> I'm trying out the dataimport capability. I have a column that is a >>> series >>> of dates separated by spaces like so: >>> "1996-00-00 1996-04-00" >>> And I'm trying to import it like so: >>> >>> >>> However this fails and the stack trace suggests it is first trying to >>> apply >>> the dateTimeFormat before splitBy. I think this is a bug... dataimport >>> should apply DateFormatTransformer and NumberFormatTransformer last. >>> >>> ~ David Smiley >>> -- >>> View this message in context: >>> http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >> > -- > View this message in context: > http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20016178.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul
Re: Tree Faceting Component
After a bit more investigating, it appears that any facet tree where the first item is numerical or boolean or some non-textual type does not produce any secondary facets. This includes sint, sfloat, boolean and such. For instance, on the sample index: facet.tree=sku,cat => works facet.tree=cat,sku => works facet.tree=manu_exact,cat => works facet.tree=cat,manu_exact => works facet.tree=popularity,inStock => fails facet.tree=inStock,popularity => fails facet.tree=manu_exact,weight => works facet.tree=weight,manu_exact => fails I'm not very familiar with the Solr / Lucene Java API, so this is slow going here. Maybe I'm barking up the wrong tree, but is the TermQuery for the secondary SimpleFacet messing up some how? I tried to dig into the code, but was unsuccessful. It appears to me that the searcher never returns a docSet for any TermQuery where the field being searched has a type that is non-textual. As a final test, I changed the schema and made the inStock field a 'text' field instead of 'boolean'. When I did that, and reindexed the sample data then the tree facet would work correctly as either facet.tree=cat,inStock or facet.tree=inStock,cat. Whereas before it would only work in the former. enjoy, -jeremy On Thu, Oct 16, 2008 at 10:55:49AM -0600, Jeremy Hinegardner wrote: > Erik, > > After some more experiments, I can get it to perform incorrectly using the > sample solr data. > > The example query from SOLR-792 ticket: > > http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=cat,inStock&wt=json&indent=on > > Make a few altertions to the query: > > 1) swap the tree order - all tree facets are 0 > > http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=inStock,cat&wt=json&indent=on > > 2) swap tree order and change facet.field to be the primary( inStock ) > > http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=inStock&facet.tree=inStock,cat&wt=json&indent=on > > Also, can tree faceting work distributed? > > enjoy, > > -jeremy > > On Wed, Oct 15, 2008 at 05:41:21PM -0700, Erik Hatcher wrote: > > Jeremy, > > > > What's the full request you're making to Solr? > > > > Do you get values when you facet normally on date_id and type? > > &facet.field=date_id&facet.field=type > > > > Erik > > > > p.s. this e-mail is not on the list (on a hotel net connection blocking > > outgoing mail) - feel free to reply to this back on the list though. > > > > On Oct 15, 2008, at 5:29 PM, Jeremy Hinegardner wrote: > > > >> Hi all, > >> > >> I'm testing out using the Tree Faceting Component (SOLR-792) on top of > >> Solr 1.3. > >> > >> It looks like it would do exactly what I want, but something is not > >> working > >> correctly with my schema. When I use the example schema, it works just > >> fine, > >> but I swap out the example schema's and example index and then put in my > >> index > >> and and schema, tree facet does not work. > >> > >> Both of the fields I want to facet can be faceted individually, but when I > >> say > >> facet.tree=date_id,type then all of the values are 0. > >> > >> Does anyone have any ideas on where I should start looking ? > >> > >> enjoy, > >> > >> -jeremy > >> > >> -- > >> > >> Jeremy Hinegardner [EMAIL PROTECTED] > > > > -- > > Jeremy Hinegardner [EMAIL PROTECTED] > -- Jeremy Hinegardner [EMAIL PROTECTED]
Re: error with delta import
the last-index_time is available only from second time onwards that is . It expects a full-import to be done first It knows that by the presence of dataimport.properties in the config directory. Did you check if it is present? On Thu, Oct 16, 2008 at 5:33 PM, Florian Aumeier <[EMAIL PROTECTED]> wrote: > Noble Paul നോബിള് नोब्ळ् schrieb: >>> >>> Well, when doing the way you described below (full-import with the delta >>> query), the '${dataimporter.last_index_time}' timestamp is empty: >>> >> >> I guess this was fixed post 1.3 . probably you can take >> dataimporthandler.jar from a nightly build (you may also need to add >> slf4j.jar) >> >>> > I replaced > dist/apache-solr-dataimporthandler-1.3.0.jar > dist/solrj-lib/slf4j-api-1.5.3.jar > dist/solrj-lib/slf4j-jdk14-1.5.3.jar > > with their counterparts from the nightly build, but it did not help. Then I > tried to enter the date kind of hard coded (now() - '12 hours'::interval). > Everything looks fine, but there are no new documents in the index. > > here is the log: > > INFO: Starting Full Import > Oct 16, 2008 1:07:08 PM org.apache.solr.core.SolrCore executeINFO: [test] > webapp=/solr path=/dataimport > params={command=full-import&clean=false&entity=articles-delta} status=0 > QTime=0 > Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 > call > INFO: Creating a connection for entity articles-delta with URL: > jdbc:postgresql://bm02:5432/bm > Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 > callINFO: Time taken for getConnection(): 45 > Oct 16, 2008 1:14:53 PM org.apache.solr.core.SolrCore execute > INFO: [test] webapp=/solr path=/dataimport params={} status=0 QTime=1 > Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter > readIndexerPropertiesINFO: Read dataimport.properties > Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter > persistStartTime > INFO: Wrote last indexed time to dataimport.properties > Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.DocBuilder > commitINFO: Full Import completed successfullyOct 16, 2008 1:16:11 PM > org.apache.solr.update.DirectUpdateHandler2 commit > INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)Oct 16, > 2008 1:16:11 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening > [EMAIL PROTECTED] mainOct 16, 2008 1:16:11 PM > org.apache.solr.update.DirectUpdateHandler2 commit > INFO: end_commit_flush > ... (autowarming) > Oct 16, 2008 1:16:12 PM org.apache.solr.handler.dataimport.DocBuilder > execute > INFO: Time taken = 0:9:3.231 > > -- --Noble Paul
Synonym format not working
Hi, I am facing issue in synonym search of solr. The synonym.txt contain the format: ccc => 1,2,ccc ccc => 3 I am not getting any search result for ccc. I have created indexes with string value. Do i need to change anything in schema .xml ? String tag from Schema.xml : Any pointers to solve the issue? Thanks, Prerna -- View this message in context: http://www.nabble.com/Synonym--format-not-working-tp20026988p20026988.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: error with delta import
If you make a database view with the query, it is easy to examine the data you want to index. Then, your solr import query would just pull the view. The Solr setup file is much simpler this way. -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 15, 2008 2:46 AM To: solr-user@lucene.apache.org Subject: Re: error with delta import The delta implementation is a bit fragile in DIH for complex queries I recommend you do delta-import using a full-import .
Re: Reduction of open files
My biggest concern is why do the remaining files stay open even if my mergeFactor is 2. I would expect to see one or two segment files and one or two sets of accompanying file (.nrm, .frq, etc), based on the documentation. Paul On Thu, Oct 16, 2008 at 4:23 PM, Paul deGrandis <[EMAIL PROTECTED]> wrote: > I currently am not. > > The document collection is highly volatile (3000 modifications a > minute) and from reading thought it would be too much of a performance > penalty but never tested it. > > What behavior in terms of file creation and open fd is seen when > useCompoundFile is set to true? > > Paul > > > On Thu, Oct 16, 2008 at 4:16 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: >> Are you using the compound file format? >> >> -Grant >> >> On Oct 16, 2008, at 3:28 PM, Paul deGrandis wrote: >> >>> I have been working with SOLR for a few months now. According to some >>> documentation I read, segment files only have one set of all the other >>> lingustic module type of stuff (normalization, frequency), is there a >>> way to remove/reduce the files not associated with a segment besides >>> optimizing the index? >>> >>> I set my mergeFactor to 2 for sake of trying to tease out a solution. >>> I have tried readercycle thinking it was just stale readers. That did >>> not work. >>> >>> If anyone has any experience or knows of any documentation that can >>> get me closer to achieving this, I would greatly appreciate it. >>> >>> Paul >> >> -- >> Grant Ingersoll >> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. >> http://www.lucenebootcamp.com >> >> >> Lucene Helpful Hints: >> http://wiki.apache.org/lucene-java/BasicsOfPerformance >> http://wiki.apache.org/lucene-java/LuceneFAQ >> >> >> >> >> >> >> >> >> >> >
Re: Reduction of open files
I currently am not. The document collection is highly volatile (3000 modifications a minute) and from reading thought it would be too much of a performance penalty but never tested it. What behavior in terms of file creation and open fd is seen when useCompoundFile is set to true? Paul On Thu, Oct 16, 2008 at 4:16 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Are you using the compound file format? > > -Grant > > On Oct 16, 2008, at 3:28 PM, Paul deGrandis wrote: > >> I have been working with SOLR for a few months now. According to some >> documentation I read, segment files only have one set of all the other >> lingustic module type of stuff (normalization, frequency), is there a >> way to remove/reduce the files not associated with a segment besides >> optimizing the index? >> >> I set my mergeFactor to 2 for sake of trying to tease out a solution. >> I have tried readercycle thinking it was just stale readers. That did >> not work. >> >> If anyone has any experience or knows of any documentation that can >> get me closer to achieving this, I would greatly appreciate it. >> >> Paul > > -- > Grant Ingersoll > Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. > http://www.lucenebootcamp.com > > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > >
Re: Reduction of open files
Are you using the compound file format? -Grant On Oct 16, 2008, at 3:28 PM, Paul deGrandis wrote: I have been working with SOLR for a few months now. According to some documentation I read, segment files only have one set of all the other lingustic module type of stuff (normalization, frequency), is there a way to remove/reduce the files not associated with a segment besides optimizing the index? I set my mergeFactor to 2 for sake of trying to tease out a solution. I have tried readercycle thinking it was just stale readers. That did not work. If anyone has any experience or knows of any documentation that can get me closer to achieving this, I would greatly appreciate it. Paul -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Reduction of open files
I have been working with SOLR for a few months now. According to some documentation I read, segment files only have one set of all the other lingustic module type of stuff (normalization, frequency), is there a way to remove/reduce the files not associated with a segment besides optimizing the index? I set my mergeFactor to 2 for sake of trying to tease out a solution. I have tried readercycle thinking it was just stale readers. That did not work. If anyone has any experience or knows of any documentation that can get me closer to achieving this, I would greatly appreciate it. Paul
Re: dataimport, both splitBy and dateTimeFormat
The wiki didn't mention I can specify multiple transformers. BTW, it's "transformer" (singular), not "transformers". I did mean both NFT and DFT because I was speaking of the general case, not just mine in particular. I thought that the built-in transformers were always in-effect and so I expected NFT,DFT to occur last. Sorry if I wasn't clear. Thanks for your help; it worked. ~ David Shalin Shekhar Mangar wrote: > > Hi David, > > I think you meant RegexTransformer instead of NumberFormatTransformer. > Anyhow, the order in which the transformers are applied is the same as the > order in which you specify them. > > So make sure your entity has > transformers="RegexTransformer,DateFormatTransformer". > > On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org > <[EMAIL PROTECTED]>wrote: > >> >> I'm trying out the dataimport capability. I have a column that is a >> series >> of dates separated by spaces like so: >> "1996-00-00 1996-04-00" >> And I'm trying to import it like so: >> >> >> However this fails and the stack trace suggests it is first trying to >> apply >> the dateTimeFormat before splitBy. I think this is a bug... dataimport >> should apply DateFormatTransformer and NumberFormatTransformer last. >> >> ~ David Smiley >> -- >> View this message in context: >> http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20016178.html Sent from the Solr - User mailing list archive at Nabble.com.
Different XML format for multi-valued fields?
Hello. I have an index built in Solr with several multi-value fields. When the multi-value field has only one value for a document, the XML returned looks like this: 5693 However, when there are multiple values for the field, the XMl looks like this: arr name="someIds"> 11199 1722 Is there a reason for this difference? Also, how does faceting work with multi-valued fields? It seems that I sometimes get facet results from multi-valued fields, and sometimes I don't. Thanks. -- View this message in context: http://www.nabble.com/Different-XML-format-for-multi-valued-fields--tp20015951p20015951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tree Faceting Component
Erik, After some more experiments, I can get it to perform incorrectly using the sample solr data. The example query from SOLR-792 ticket: http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=cat,inStock&wt=json&indent=on Make a few altertions to the query: 1) swap the tree order - all tree facets are 0 http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=inStock,cat&wt=json&indent=on 2) swap tree order and change facet.field to be the primary( inStock ) http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=inStock&facet.tree=inStock,cat&wt=json&indent=on Also, can tree faceting work distributed? enjoy, -jeremy On Wed, Oct 15, 2008 at 05:41:21PM -0700, Erik Hatcher wrote: > Jeremy, > > What's the full request you're making to Solr? > > Do you get values when you facet normally on date_id and type? > &facet.field=date_id&facet.field=type > > Erik > > p.s. this e-mail is not on the list (on a hotel net connection blocking > outgoing mail) - feel free to reply to this back on the list though. > > On Oct 15, 2008, at 5:29 PM, Jeremy Hinegardner wrote: > >> Hi all, >> >> I'm testing out using the Tree Faceting Component (SOLR-792) on top of >> Solr 1.3. >> >> It looks like it would do exactly what I want, but something is not >> working >> correctly with my schema. When I use the example schema, it works just >> fine, >> but I swap out the example schema's and example index and then put in my >> index >> and and schema, tree facet does not work. >> >> Both of the fields I want to facet can be faceted individually, but when I >> say >> facet.tree=date_id,type then all of the values are 0. >> >> Does anyone have any ideas on where I should start looking ? >> >> enjoy, >> >> -jeremy >> >> -- >> >> Jeremy Hinegardner [EMAIL PROTECTED] > -- Jeremy Hinegardner [EMAIL PROTECTED]
RegexTransformer debugging (DIH)
Is there a way to prevent this from occurring (or a way to nail down the doc which is causing it?): INFO: [news] webapp=/solr path=/admin/dataimport params={command=status} status=0 QTime=0 Exception in thread "Thread-14" java.lang.StackOverflowError at java.util.regex.Pattern$Single.match(Pattern.java:3313) at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4763) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) at java.util.regex.Pattern$All.match(Pattern.java:4079) at java.util.regex.Pattern$Branch.match(Pattern.java:4538) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578) at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) at java.util.regex.Pattern$All.match(Pattern.java:4079) at java.util.regex.Pattern$Branch.match(Pattern.java:4538) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578) at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) at java.util.regex.Pattern$All.match(Pattern.java:4079) Thanks. - Jon
Re: updating documents in solr 1.3.0
This is being worked on for Solr 1.4: https://issues.apache.org/jira/browse/SOLR-139 Bill On Wed, Oct 15, 2008 at 7:47 PM, Walter Underwood <[EMAIL PROTECTED]>wrote: > Neither Solr no Lucene support partial updates. "Update" means > "add or replace". --wunder > > On 10/15/08 4:23 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > > Hi, > > I've been trying to find a way to post partial updates, updating > only > > some of the fields in a set of records, via POSTed XML messages to a solr > > 1.3.0 index. In the wiki (http://wiki.apache.org/solr/UpdateXmlMessages > ), > > it almost seems like there's a special root tag which isn't > > mentioned anywhere else. Am I correct in assuming that no such > tag > > exists? > > > > Thanks in advance, > > > > Evan Kelsey > > > > > > http://www.mintel.com > > providing insight + impact > > > > Chicago Office: > > Mintel International Group Ltd (Mintel) > > 351 Hubbard Street, Floor 8 > > Chicago, IL 60610 > > USA > > > > Tel: 312 932 0400 > > Fax: 312 932 0469 > > > > London Office: > > Mintel International Group Ltd (Mintel) > > 18-19 Long Lane > > London > > EC1A 9PL > > UK > > > > Tel: 020 7606 4533 > > Fax: 020 7606 5932 > > > > > > Notice > > > > This email may contain information that is privileged, > > confidential or otherwise protected from disclosure. It > > must not be used by, or its contents copied or disclosed > > to, persons other than the addressee. If you have received > > this email in error please notify the sender immediately > > and delete the email. Any views or opinions expressed in > > this message are solely those of the author, and do not > > necessarily reflect those of Mintel. > > > > No Mintel staff are authorised to make purchases using > > email or over the internet, and any contracts so performed > > are invalid. > > > > Warning > > ** > > It is the responsibility of the recipient to ensure that > > the onward transmission, opening or use of this message > > and any attachments will not adversely affect their systems > > or data. Please carry out such virus and other checks, as > > you consider appropriate. > > > >
Re: How to retrieve all field names of index of one type
Hi, I don't have the sources handy, but look at the Luke request handler in Solr sources and you'll see how it can be done. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: prerna07 <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, October 15, 2008 2:51:48 AM > Subject: How to retrieve all field names of index of one type > > > Hi, > > I want to retrieve all field names of one index type, is there any way solr > can do this? > > For example: I have 3 index with the field name and value : > ProductVO > I want to retrieve all other field names present in the indexes which have > field name as index_type and value as "ProductVO". > > Please let me know if you need more details. > > Thanks, > Prerna > -- > View this message in context: > http://www.nabble.com/How-to-retrieve-all-field-names-of-index-of-one-type-tp19987807p19987807.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Advice on analysis/filtering?
Jarek Zgoda wrote: Wiadomość napisana w dniu 2008-10-16, o godz. 16:21, przez Grant Ingersoll: I'm trying to create a search facility for documents in "broken" Polish (by broken I mean "not language rules compliant"), Can you explain what you mean here a bit more? I don't know Polish, Hi guys, I do speak Polish :) maybe I can help here a bit. Some documents (around 15% of all pile) contain the texts entered by children from primary school's and that implies many syntactic and ortographic errors. document text: "włatcy móch" (in proper Polish this would be "władcy much") example terms that should match: "włatcy much", "wlatcy moch", "wladcy much" These examples can be classified as "sounds like", and typically soundexing algorithms are used to address this problem, in order to generate initial suggestions. After that you can use other heuristic rules to select the most probable correct forms. AFAIK, there are no (public) soundex implementations for Polish, in particular in Java, although there was some research work done on the construction of a specifically Polish soundex. You could also use the Daitch-Mokotoff soundex, which comes close enough. Taking word "włatcy" from my example, I'd like to find documents containing words "wlatcy" (latin-2 accentuations stripped from original), This step is trivial. "władcy" (proper form of this noun) and "wladcy" (latin-2 accents stripped from proper form). And this one is not. It requires using something like soundexing in order to look up possible similar terms. However ... in this process you inevitably collect false positives, and you don't have any way in the input text to determine that they should be rejected. You can only make this decision based on some external knowledge of Polish, such as: * a morpho-syntactic analyzer that will determine which combinations of suggestions are more correct and more probable, * a language model that for any given soundexed phrase can generate the most probable original phrases. Also, knowing the context in which a query is asked may help, but usually you don't have this information (queries are short). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: How Synonyms work in Solr
Hi, It looks like you have not seen a pretty detailed page on Synonyms on the Solr wiki. Have a look, I think you'll find answers to your questions there. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: payalsharma <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, October 16, 2008 9:55:55 AM > Subject: How Synonyms work in Solr > > > Hi, > > Please explain that how the below mentioned synonyms patterns work in Solr > Search as there exists several seperators for synonym patterns: > > 1. > > #Explicit mappings match any token sequence on the LHS of "=>" > #and replace with all alternatives on the RHS. These types of mappings > #ignore the expand parameter in the schema. > > #Examples: > i-pod, i pod => ipod, > sea biscuit, sea biscit => seabiscuit > > > 2. > > #Equivalent synonyms may be separated with commas and give > #no explicit mapping. In this case the mapping behavior will > #be taken from the expand parameter in the schema. This allows > #the same synonym file to be used in different synonym handling strategies. > > #Examples: > ipod, i-pod, i pod > foozball , foosball > universe , cosmos > > 3. > # If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit > mapping: > ipod, i-pod, i pod => ipod, i-pod, i pod > # If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit > mapping: > ipod, i-pod, i pod => ipod > > > 4. > #multiple synonym mapping entries are merged. > foo => foo bar > foo => baz > #is equivalent to > foo => foo bar, baz > > > 5. > Explain the meaning of this pattern: > > a\=>a => b\=>b > a\,a => b\,b > > Questions: > > A) Among the following what all characters works as delimeters : > Whitespace(" ") comma(",") "=>" "\" "/" > B) Also, please let us know whether there exists certain other patterns > apart from the above mentioned ones. > C) In the pattern : ipod, i-pod, i pod >Here how we will determine that "i pod" has to be treated as a single > word though it contains Whitespace. > -- > View this message in context: > http://www.nabble.com/How-Synonyms-work-in-Solr-tp20014192p20014192.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: snapshooter and spellchecker
Geoff, maybe this will help: https://issues.apache.org/jira/browse/SOLR-433 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Geoffrey Young <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, October 16, 2008 10:34:40 AM > Subject: snapshooter and spellchecker > > hi all :) > > I was surprised to find that snapshooter didn't account for the > spellcheck dictionary. but then again, since you can call it whatever > you want I guess it couldn't. > > so, how are people distributing their dictionaries across their slaves? > since it takes so long to generate, I can't see it being practical to > generate it on each slave, especially as they'd all have the same data > as the master anyway. > > tia > > --Geoff
Re[2]: How to change a port?
Hello Ryan, Thats exactly what I was looking for. Thanks! RM> that will depend on your servlet container. (jetty, resin, tomcat, RM> etc...) RM> If you are running jetty from the example, you can change the port by RM> adding -Djetty.port=1234 to the command line. The port is configured RM> in example/etc/jetty.xml RM> the relevant line is: RM> > RM> ryan RM> On Oct 16, 2008, at 10:30 AM, Aleksey Gogolev wrote: >> >> Hello. >> >> Is there a way to change the port (8983) of solr example? >> I want to run two solr examples simultaneously. >> >> -- >> Aleksey Gogolev >> developer, >> dev.co.ua >> Aleksey >> RM> __ NOD32 3528 (20081016) Information __ RM> This message was checked by NOD32 antivirus system. RM> http://www.eset.com -- Aleksey Gogolev developer, dev.co.ua Aleksey mailto:[EMAIL PROTECTED]
Re: Advice on analysis/filtering?
Wiadomość napisana w dniu 2008-10-16, o godz. 16:21, przez Grant Ingersoll: I'm trying to create a search facility for documents in "broken" Polish (by broken I mean "not language rules compliant"), Can you explain what you mean here a bit more? I don't know Polish, but most spoken languages can't be pinned down to a specific set of rules. In other words, the exception is the rule. Or, are you saying the documents use more dialog based, i.e. more informal, as in two people having a conversation? Some documents (around 15% of all pile) contain the texts entered by children from primary school's and that implies many syntactic and ortographic errors. The text is indexed "as is" and Solr is able to find exact occurences, but I'd like to be able to find also documents that contain other variations of errors and proper forms, too. And oh, the system will be used by the same aged children, who tends to make similar errors when entering search terms. searchable by terms in "broken" Polish, but broken in many other ways than documents. See this example: document text: "włatcy móch" (in proper Polish this would be "władcy much") example terms that should match: "włatcy much", "wlatcy moch", "wladcy much" This double brokeness ruled out any Polish stemmers currently available for Lucene and now I am at point 0. The search results do not have to be 100% accurate - some missing results are acceptable, but "false positives" are not. There's no such thing in any language. In your example above, what is matching that shouldn't? Is this happening across a lot of documents, or just a few? Yea, I know that. By "not acceptable" I mean "not acceptable above some level". Sorry for this confusion. Taking word "włatcy" from my example, I'd like to find documents containing words "wlatcy" (latin-2 accentuations stripped from original), "władcy" (proper form of this noun) and "wladcy" (latin-2 accents stripped from proper form). The issue #1 (stripping accentuations from original) seems to be resolvable outside solr - I can index texts with accentuations stripped already. The issue #2 (finding proper form for word) is the most interesting for me. Issue #3 depends on #1 and #2. Is it at all possible using machinery provided by Solr (I do not own PHD in liguistics), or should I ask the business for lowering their expectations? Well, I think there are a couple of approaches: 1. You can write your own filter/stemmer/analyzer that you think fixes these issues 2. You can protect the "broken" words and not have them filtered, or filter them differently. 3. You can lower expectations. One thing to try out is Solr's analysis tool in the admin, and see if you can get a better handle on what is going wrong. I'll see how far I could go with spellchecker and fuzzy searches. -- We read Knuth so you don't have to. - Tim Peters Jarek Zgoda, R&D, Redefine [EMAIL PROTECTED]
Re: How to change a port?
that will depend on your servlet container. (jetty, resin, tomcat, etc...) If you are running jetty from the example, you can change the port by adding -Djetty.port=1234 to the command line. The port is configured in example/etc/jetty.xml the relevant line is: > ryan On Oct 16, 2008, at 10:30 AM, Aleksey Gogolev wrote: Hello. Is there a way to change the port (8983) of solr example? I want to run two solr examples simultaneously. -- Aleksey Gogolev developer, dev.co.ua Aleksey
snapshooter and spellchecker
hi all :) I was surprised to find that snapshooter didn't account for the spellcheck dictionary. but then again, since you can call it whatever you want I guess it couldn't. so, how are people distributing their dictionaries across their slaves? since it takes so long to generate, I can't see it being practical to generate it on each slave, especially as they'd all have the same data as the master anyway. tia --Geoff
How to change a port?
Hello. Is there a way to change the port (8983) of solr example? I want to run two solr examples simultaneously. -- Aleksey Gogolev developer, dev.co.ua Aleksey
Re: Advice on analysis/filtering?
On Oct 16, 2008, at 3:07 AM, Jarek Zgoda wrote: Hello, group. I'm trying to create a search facility for documents in "broken" Polish (by broken I mean "not language rules compliant"), Can you explain what you mean here a bit more? I don't know Polish, but most spoken languages can't be pinned down to a specific set of rules. In other words, the exception is the rule. Or, are you saying the documents use more dialog based, i.e. more informal, as in two people having a conversation? searchable by terms in "broken" Polish, but broken in many other ways than documents. See this example: document text: "włatcy móch" (in proper Polish this would be "władcy much") example terms that should match: "włatcy much", "wlatcy moch", "wladcy much" This double brokeness ruled out any Polish stemmers currently available for Lucene and now I am at point 0. The search results do not have to be 100% accurate - some missing results are acceptable, but "false positives" are not. There's no such thing in any language. In your example above, what is matching that shouldn't? Is this happening across a lot of documents, or just a few? Is it at all possible using machinery provided by Solr (I do not own PHD in liguistics), or should I ask the business for lowering their expectations? Well, I think there are a couple of approaches: 1. You can write your own filter/stemmer/analyzer that you think fixes these issues 2. You can protect the "broken" words and not have them filtered, or filter them differently. 3. You can lower expectations. One thing to try out is Solr's analysis tool in the admin, and see if you can get a better handle on what is going wrong. -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Advice on analysis/filtering?
You're welcome. I should have pointed out that I was responding mostly to the "false hits are not acceptable" portion, which I don't think is achievable Best Erick 2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]> > Wiadomość napisana w dniu 2008-10-16, o godz. 15:54, przez Erick Erickson: > > Well, let me see. Your customers are telling you, in essence, >> "for any random input, you cannot return false positives". Which >> is nonsense, so I'd say you need to negotiate with your >> customers. I flat guarantee that, for any algorithm you try, >> you can write a counter-example in, oh, 15 seconds or so . >> > > They came to such expectations seeing Solr's own Spellcheck at work - if it > can suggest correct versions, it should be able to sanitize broken words in > documents and search them using sanitized input. For me, this seemed > reasonable request (of course, if this can be achieved reasonably abusing > solr's spellcheck component). > > FuzzySearch tries to do some of this work for you, and that may be >> acceptable, as this is a common issue. But it'll never be >> perfect. >> >> You might get some joy from ngrams, but I haven't >> worked with it myself, just seen it recommended by people >> whose opinions I respect... >> > > Thank you for these suggestions. > > > >> >> Best >> Erick >> >> >> 2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]> >> >> Hello, group. >>> >>> I'm trying to create a search facility for documents in "broken" Polish >>> (by >>> broken I mean "not language rules compliant"), searchable by terms in >>> "broken" Polish, but broken in many other ways than documents. See this >>> example: >>> >>> document text: "włatcy móch" (in proper Polish this would be "władcy >>> much") >>> example terms that should match: "włatcy much", "wlatcy moch", "wladcy >>> much" >>> >>> This double brokeness ruled out any Polish stemmers currently available >>> for >>> Lucene and now I am at point 0. The search results do not have to be 100% >>> accurate - some missing results are acceptable, but "false positives" are >>> not. Is it at all possible using machinery provided by Solr (I do not own >>> PHD in liguistics), or should I ask the business for lowering their >>> expectations? >>> >>> -- >>> We read Knuth so you don't have to. - Tim Peters >>> >>> Jarek Zgoda, R&D, Redefine >>> [EMAIL PROTECTED] >>> >>> >>> > -- > We read Knuth so you don't have to. - Tim Peters > > Jarek Zgoda, R&D, Redefine > [EMAIL PROTECTED] > >
Re: Advice on analysis/filtering?
Wiadomość napisana w dniu 2008-10-16, o godz. 15:54, przez Erick Erickson: Well, let me see. Your customers are telling you, in essence, "for any random input, you cannot return false positives". Which is nonsense, so I'd say you need to negotiate with your customers. I flat guarantee that, for any algorithm you try, you can write a counter-example in, oh, 15 seconds or so . They came to such expectations seeing Solr's own Spellcheck at work - if it can suggest correct versions, it should be able to sanitize broken words in documents and search them using sanitized input. For me, this seemed reasonable request (of course, if this can be achieved reasonably abusing solr's spellcheck component). FuzzySearch tries to do some of this work for you, and that may be acceptable, as this is a common issue. But it'll never be perfect. You might get some joy from ngrams, but I haven't worked with it myself, just seen it recommended by people whose opinions I respect... Thank you for these suggestions. Best Erick 2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]> Hello, group. I'm trying to create a search facility for documents in "broken" Polish (by broken I mean "not language rules compliant"), searchable by terms in "broken" Polish, but broken in many other ways than documents. See this example: document text: "włatcy móch" (in proper Polish this would be "władcy much") example terms that should match: "włatcy much", "wlatcy moch", "wladcy much" This double brokeness ruled out any Polish stemmers currently available for Lucene and now I am at point 0. The search results do not have to be 100% accurate - some missing results are acceptable, but "false positives" are not. Is it at all possible using machinery provided by Solr (I do not own PHD in liguistics), or should I ask the business for lowering their expectations? -- We read Knuth so you don't have to. - Tim Peters Jarek Zgoda, R&D, Redefine [EMAIL PROTECTED] -- We read Knuth so you don't have to. - Tim Peters Jarek Zgoda, R&D, Redefine [EMAIL PROTECTED]
How Synonyms work in Solr
Hi, Please explain that how the below mentioned synonyms patterns work in Solr Search as there exists several seperators for synonym patterns: 1. #Explicit mappings match any token sequence on the LHS of "=>" #and replace with all alternatives on the RHS. These types of mappings #ignore the expand parameter in the schema. #Examples: i-pod, i pod => ipod, sea biscuit, sea biscit => seabiscuit 2. #Equivalent synonyms may be separated with commas and give #no explicit mapping. In this case the mapping behavior will #be taken from the expand parameter in the schema. This allows #the same synonym file to be used in different synonym handling strategies. #Examples: ipod, i-pod, i pod foozball , foosball universe , cosmos 3. # If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod, i-pod, i pod # If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod 4. #multiple synonym mapping entries are merged. foo => foo bar foo => baz #is equivalent to foo => foo bar, baz 5. Explain the meaning of this pattern: a\=>a => b\=>b a\,a => b\,b Questions: A) Among the following what all characters works as delimeters : Whitespace(" ") comma(",") "=>" "\" "/" B) Also, please let us know whether there exists certain other patterns apart from the above mentioned ones. C) In the pattern : ipod, i-pod, i pod Here how we will determine that "i pod" has to be treated as a single word though it contains Whitespace. -- View this message in context: http://www.nabble.com/How-Synonyms-work-in-Solr-tp20014192p20014192.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Advice on analysis/filtering?
Well, let me see. Your customers are telling you, in essence, "for any random input, you cannot return false positives". Which is nonsense, so I'd say you need to negotiate with your customers. I flat guarantee that, for any algorithm you try, you can write a counter-example in, oh, 15 seconds or so . I think the best you can hope for is "reasonable results", but getting your customers to agree to what is "reasonable" is...er... often a challenge. Frequently when confronted by "close but not perfect", customers aren't as unforgiving as their first position would indicate since the inconvenience of the not- quite-perfect results is often much less than people think when starting out. FuzzySearch tries to do some of this work for you, and that may be acceptable, as this is a common issue. But it'll never be perfect. You might get some joy from ngrams, but I haven't worked with it myself, just seen it recommended by people whose opinions I respect... Best Erick 2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]> > Hello, group. > > I'm trying to create a search facility for documents in "broken" Polish (by > broken I mean "not language rules compliant"), searchable by terms in > "broken" Polish, but broken in many other ways than documents. See this > example: > > document text: "włatcy móch" (in proper Polish this would be "władcy much") > example terms that should match: "włatcy much", "wlatcy moch", "wladcy > much" > > This double brokeness ruled out any Polish stemmers currently available for > Lucene and now I am at point 0. The search results do not have to be 100% > accurate - some missing results are acceptable, but "false positives" are > not. Is it at all possible using machinery provided by Solr (I do not own > PHD in liguistics), or should I ask the business for lowering their > expectations? > > -- > We read Knuth so you don't have to. - Tim Peters > > Jarek Zgoda, R&D, Redefine > [EMAIL PROTECTED] > >
Re: dataimport, both splitBy and dateTimeFormat
Hi David, I think you meant RegexTransformer instead of NumberFormatTransformer. Anyhow, the order in which the transformers are applied is the same as the order in which you specify them. So make sure your entity has transformers="RegexTransformer,DateFormatTransformer". On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org <[EMAIL PROTECTED]>wrote: > > I'm trying out the dataimport capability. I have a column that is a series > of dates separated by spaces like so: > "1996-00-00 1996-04-00" > And I'm trying to import it like so: > > > However this fails and the stack trace suggests it is first trying to apply > the dateTimeFormat before splitBy. I think this is a bug... dataimport > should apply DateFormatTransformer and NumberFormatTransformer last. > > ~ David Smiley > -- > View this message in context: > http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
dataimport, both splitBy and dateTimeFormat
I'm trying out the dataimport capability. I have a column that is a series of dates separated by spaces like so: "1996-00-00 1996-04-00" And I'm trying to import it like so: However this fails and the stack trace suggests it is first trying to apply the dateTimeFormat before splitBy. I think this is a bug... dataimport should apply DateFormatTransformer and NumberFormatTransformer last. ~ David Smiley -- View this message in context: http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: snapcleaner >> problem solr 1.3
On Oct 16, 2008, at 4:29 AM, sunnyfr wrote: still nothing changed : It looks like it worked better to me, in that it resulted in a valid find command for any snapshots with an -mtime of +1: ++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime +1 -print instead of showing an error like before: ++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime +-1 -print find: invalid argument `+-1' to `-mtime' But it didn't find any snapshots to remove. Do you have any snapshots that haven't been modified in 2+ days? Due to the way find - mtime works (looking at the modification day, and ignoring fractions of days), for a snapshot to match, it would have to not have been modified for a couple days. -Chris
Re: error with delta import
Noble Paul നോബിള് नोब्ळ् schrieb: Well, when doing the way you described below (full-import with the delta query), the '${dataimporter.last_index_time}' timestamp is empty: I guess this was fixed post 1.3 . probably you can take dataimporthandler.jar from a nightly build (you may also need to add slf4j.jar) I replaced dist/apache-solr-dataimporthandler-1.3.0.jar dist/solrj-lib/slf4j-api-1.5.3.jar dist/solrj-lib/slf4j-jdk14-1.5.3.jar with their counterparts from the nightly build, but it did not help. Then I tried to enter the date kind of hard coded (now() - '12 hours'::interval). Everything looks fine, but there are no new documents in the index. here is the log: INFO: Starting Full Import Oct 16, 2008 1:07:08 PM org.apache.solr.core.SolrCore executeINFO: [test] webapp=/solr path=/dataimport params={command=full-import&clean=false&entity=articles-delta} status=0 QTime=0 Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity articles-delta with URL: jdbc:postgresql://bm02:5432/bm Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Time taken for getConnection(): 45 Oct 16, 2008 1:14:53 PM org.apache.solr.core.SolrCore execute INFO: [test] webapp=/solr path=/dataimport params={} status=0 QTime=1 Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerPropertiesINFO: Read dataimport.properties Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter persistStartTime INFO: Wrote last indexed time to dataimport.properties Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.DocBuilder commitINFO: Full Import completed successfullyOct 16, 2008 1:16:11 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)Oct 16, 2008 1:16:11 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening [EMAIL PROTECTED] mainOct 16, 2008 1:16:11 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush ... (autowarming) Oct 16, 2008 1:16:12 PM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:9:3.231
Re: snapcleaner >> problem solr 1.3
still nothing changed : [EMAIL PROTECTED]:/data/solr/video# ./bin/snapcleaner -V -D 1 + [[ -z 1 ]] + fixUser -V -D 1 + [[ -z '' ]] ++ whoami + user=root ++ whoami + [[ root != root ]] ++ who -m ++ cut '-d ' -f1 ++ sed '-es/^.*!//' + oldwhoami=root + [[ root == '' ]] + [[ -z /data/solr/video/data ]] ++ echo /data/solr/video/data ++ cut -c1 + [[ / != \/ ]] + setStartTime + [[ Linux == \S\u\n\O\S ]] ++ date +%s + start=1224156482 + logMessage started by root ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2008/10/16 13:28:02 started by root + [[ -n '' ]] + logMessage command: ./bin/snapcleaner -V -D 1 ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2008/10/16 13:28:02 command: ./bin/snapcleaner -V -D 1 + [[ -n '' ]] + trap 'echo "caught INT/TERM, exiting now but partial cleanup may have already occured";logExit aborted 13' INT TERM + [[ -n 1 ]] + find /data/solr/video/data -maxdepth 0 -name foobar + '[' 0 = 0 ']' + maxdepth='-maxdepth 1' + logMessage cleaning up snapshots more than 1 days old ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2008/10/16 13:28:02 cleaning up snapshots more than 1 days old + [[ -n '' ]] ++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime +1 -print + logExit ended 0 + [[ Linux == \S\u\n\O\S ]] ++ date +%s + end=1224156482 ++ expr 1224156482 - 1224156482 + diff=0 ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo '2008/10/16 13:28:02 ended (elapsed time: 0 sec)' + exit 0 Chris Haggstrom wrote: > > > On Oct 16, 2008, at 3:10 AM, sunnyfr wrote: >> >> I've a wierd problem when I try to fire snapcleaner manually : >> Already : is it correct : [EMAIL PROTECTED]:/data/solr/video# >> ./bin/snapcleaner -V -D-1 >> >> To remove every snapshot older than one day. > > You need to change "-D -1" to "-D 1". Otherwise, you're trying to > remove snapshots older than -1 days, which is an invalid argument to > pass to 'find -mtime' as is shown in these lines of your debug output. > >> It doesn't remove older than one day obviously and debugger show me : >> >> + logMessage cleaning up snapshots more than -1 days old >> ++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime >> +-1 >> find: invalid argument `+-1' to `-mtime' > > > -Chris > > -- View this message in context: http://www.nabble.com/snapcleaner-%3E%3E-problem-solr-1.3-tp20010689p20011746.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: snapcleaner >> problem solr 1.3
On Oct 16, 2008, at 3:10 AM, sunnyfr wrote: I've a wierd problem when I try to fire snapcleaner manually : Already : is it correct : [EMAIL PROTECTED]:/data/solr/video# ./bin/snapcleaner -V -D-1 To remove every snapshot older than one day. You need to change "-D -1" to "-D 1". Otherwise, you're trying to remove snapshots older than -1 days, which is an invalid argument to pass to 'find -mtime' as is shown in these lines of your debug output. It doesn't remove older than one day obviously and debugger show me : + logMessage cleaning up snapshots more than -1 days old ++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime +-1 find: invalid argument `+-1' to `-mtime' -Chris
Re: Solr search not displaying all the indexed values.
Yes. something similar to : But the searching will not give all the results even if there is only one result. whereas indexing is fine. Thanks con Noble Paul നോബിള് नोब्ळ् wrote: > > do you have 2 queries in 2 different entities? > > > On Thu, Oct 16, 2008 at 3:17 PM, con <[EMAIL PROTECTED]> wrote: >> >> I have two queries in my data-config.xml which takes values from multiple >> tables, like: >> select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id= >> CUSTOMER.prod_id. >> >> When i do a full-import it is indexing all the rows as expected. >> >> But when i search it with a *:* , it is not displaying all the values. >> Do I need any extra configurations? >> >> Thanks >> con >> -- >> View this message in context: >> http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > --Noble Paul > > -- View this message in context: http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20011033.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr search not displaying all the indexed values.
Yes. something similar to : But the searching will not give all the results even if there is only one result. whereas indexing is fine. Thanks con Noble Paul നോബിള് नोब्ळ् wrote: > > do you have 2 queries in 2 different entities? > > > On Thu, Oct 16, 2008 at 3:17 PM, con <[EMAIL PROTECTED]> wrote: >> >> I have two queries in my data-config.xml which takes values from multiple >> tables, like: >> select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id= >> CUSTOMER.prod_id. >> >> When i do a full-import it is indexing all the rows as expected. >> >> But when i search it with a *:* , it is not displaying all the values. >> Do I need any extra configurations? >> >> Thanks >> con >> -- >> View this message in context: >> http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > --Noble Paul > > -- View this message in context: http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010927.html Sent from the Solr - User mailing list archive at Nabble.com.
snapcleaner >> problem solr 1.3
Hi guys, I've a wierd problem when I try to fire snapcleaner manually : Already : is it correct : [EMAIL PROTECTED]:/data/solr/video# ./bin/snapcleaner -V -D-1 To remove every snapshot older than one day. It doesn't remove older than one day obviously and debugger show me : + [[ -z -1 ]] + fixUser -V -D -1 + [[ -z '' ]] ++ whoami + user=root ++ whoami + [[ root != root ]] ++ who -m ++ cut '-d ' -f1 ++ sed '-es/^.*!//' + oldwhoami=root + [[ root == '' ]] + [[ -z /data/solr/video/data ]] ++ echo /data/solr/video/data ++ cut -c1 + [[ / != \/ ]] + setStartTime + [[ Linux == \S\u\n\O\S ]] ++ date +%s + start=1224151299 + logMessage started by root ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2008/10/16 12:01:39 started by root + [[ -n '' ]] + logMessage command: ./bin/snapcleaner -V -D -1 ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2008/10/16 12:01:39 command: ./bin/snapcleaner -V -D -1 + [[ -n '' ]] + trap 'echo "caught INT/TERM, exiting now but partial cleanup may have already occured";logExit aborted 13' INT TERM + [[ -n -1 ]] + find /data/solr/video/data -maxdepth 0 -name foobar + '[' 0 = 0 ']' + maxdepth='-maxdepth 1' + logMessage cleaning up snapshots more than -1 days old ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2008/10/16 12:01:39 cleaning up snapshots more than -1 days old + [[ -n '' ]] ++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime +-1 -print find: invalid argument `+-1' to `-mtime' + logExit ended 0 + [[ Linux == \S\u\n\O\S ]] ++ date +%s + end=1224151299 ++ expr 1224151299 - 1224151299 + diff=0 ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo '2008/10/16 12:01:39 ended (elapsed time: 0 sec)' + exit 0 Any idea why? thanks -- View this message in context: http://www.nabble.com/snapcleaner-%3E%3E-problem-solr-1.3-tp20010689p20010689.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr search not displaying all the indexed values.
do you have 2 queries in 2 different entities? On Thu, Oct 16, 2008 at 3:17 PM, con <[EMAIL PROTECTED]> wrote: > > I have two queries in my data-config.xml which takes values from multiple > tables, like: > select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id= CUSTOMER.prod_id. > > When i do a full-import it is indexing all the rows as expected. > > But when i search it with a *:* , it is not displaying all the values. > Do I need any extra configurations? > > Thanks > con > -- > View this message in context: > http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul
Solr search not displaying all the indexed values.
I have two queries in my data-config.xml which takes values from multiple tables, like: select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id= CUSTOMER.prod_id. When i do a full-import it is indexing all the rows as expected. But when i search it with a *:* , it is not displaying all the values. Do I need any extra configurations? Thanks con -- View this message in context: http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: error with delta import
On Thu, Oct 16, 2008 at 2:08 PM, Florian Aumeier <[EMAIL PROTECTED]> wrote: > Noble Paul നോബിള് नोब्ळ् schrieb: >> >> The delta implementation is a bit fragile in DIH for complex queries >> >> > > that's too bad. It's a nice interface and less complex to configure than to > go the XML /update way. > > > Well, when doing the way you described below (full-import with the delta > query), the '${dataimporter.last_index_time}' timestamp is empty: I guess this was fixed post 1.3 . probably you can take dataimporthandler.jar from a nightly build (you may also need to add slf4j.jar) > > Oct 16, 2008 10:14:53 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > SEVERE: Full Import failed > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute query: SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS > article_ref,a.id_blogs,a.title AS article_title, a.normalized_text, au.url > AS article_url, bu.url AS blog_url, b.title AS blog_title,b.subtitle AS > blog_subtitle, r.rank, coalesce(a.updated,a.published,a.added) as ts, a.stub > as article_stub FROM articles a join blogs b on a.id_blogs = b.id join urls > au on a.id_urls = au.id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN > ranks r on a.id = r.id_articles WHERE b.id_urls is not null AND a.hidden is > false AND b.hidden is false AND a.ref is not null AND b.ref is not null and > (rankid in (SELECT rankid FROM ranks order by rankid desc limit 1) OR rankid > is null) AND coalesce(a.updated,a.published,a.added) > '' Processing > Document # 1 > > Regards > Florian > > >> I recommend you do delta-import using a full-import >> >> it can be done as follows >> define a diffferent entity >> >> >> > url="jdbc:postgresql://bm02:5432/bm" user="user" /> >> >> >> >> >> >> > query=""> >> >> >> >> >> >> >> when you wish to do a full-import pass the request parameter >> entity=articles-full >> >> for delta-import use the request parameter >> entity=articles-delta&clean=false (command has to be full-import only) >> >> >> >> On Wed, Oct 15, 2008 at 1:42 PM, Florian Aumeier >> <[EMAIL PROTECTED]> wrote: >> >>> >>> Shalin Shekhar Mangar schrieb: >>> You are missing the "pk" field (primary key). This is used for delta imports. >>> >>> I added the pk field and rebuild the index yesterday. However, when I run >>> the delta-import, I still have this error message in the log: >>> >>> INFO: Starting delta collection. >>> Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.DocBuilder >>> collectDelta >>> INFO: Running ModifiedRowKey() for Entity: articles >>> Oct 15, 2008 9:37:27 AM >>> org.apache.solr.handler.dataimport.JdbcDataSource$1 >>> call >>> INFO: Creating a connection for entity articles with URL: >>> jdbc:postgresql://bm02:5432/bm >>> Oct 15, 2008 9:37:27 AM >>> org.apache.solr.handler.dataimport.JdbcDataSource$1 >>> call >>> INFO: Time taken for getConnection(): 43 >>> Oct 15, 2008 9:37:36 AM org.apache.solr.core.SolrCore execute >>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0 >>> Oct 15, 2008 9:44:51 AM org.apache.solr.core.SolrCore execute >>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0 >>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder >>> collectDelta >>> INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 4584 >>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder >>> collectDelta >>> INFO: Running DeletedRowKey() for Entity: articles >>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder >>> collectDelta >>> INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0 >>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder >>> collectDelta >>> INFO: Completed parentDeltaQuery for Entity: articles >>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DataImporter >>> doDeltaImport >>> SEVERE: Delta Import Failed >>> java.lang.NullPointerException >>> at >>> >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153) >>> at >>> >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125) >>> at >>> >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) >>> at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) >>> at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211) >>> at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133) >>> at >>> >>> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359) >>> at >>> >>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388) >>> at >>> >>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>> Oct 15, 2008 9:50:58 AM org.apache.so
Re: error with delta import
Noble Paul നോബിള് नोब्ळ् schrieb: The delta implementation is a bit fragile in DIH for complex queries that's too bad. It's a nice interface and less complex to configure than to go the XML /update way. Well, when doing the way you described below (full-import with the delta query), the '${dataimporter.last_index_time}' timestamp is empty: Oct 16, 2008 10:14:53 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS article_ref,a.id_blogs,a.title AS article_title, a.normalized_text, au.url AS article_url, bu.url AS blog_url, b.title AS blog_title,b.subtitle AS blog_subtitle, r.rank, coalesce(a.updated,a.published,a.added) as ts, a.stub as article_stub FROM articles a join blogs b on a.id_blogs = b.id join urls au on a.id_urls = au.id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN ranks r on a.id = r.id_articles WHERE b.id_urls is not null AND a.hidden is false AND b.hidden is false AND a.ref is not null AND b.ref is not null and (rankid in (SELECT rankid FROM ranks order by rankid desc limit 1) OR rankid is null) AND coalesce(a.updated,a.published,a.added) > '' Processing Document # 1 Regards Florian I recommend you do delta-import using a full-import it can be done as follows define a diffferent entity when you wish to do a full-import pass the request parameter entity=articles-full for delta-import use the request parameter entity=articles-delta&clean=false (command has to be full-import only) On Wed, Oct 15, 2008 at 1:42 PM, Florian Aumeier <[EMAIL PROTECTED]> wrote: Shalin Shekhar Mangar schrieb: You are missing the "pk" field (primary key). This is used for delta imports. I added the pk field and rebuild the index yesterday. However, when I run the delta-import, I still have this error message in the log: INFO: Starting delta collection. Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: articles Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity articles with URL: jdbc:postgresql://bm02:5432/bm Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 43 Oct 15, 2008 9:37:36 AM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0 Oct 15, 2008 9:44:51 AM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0 Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 4584 Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running DeletedRowKey() for Entity: articles Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0 Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: articles Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DataImporter doDeltaImport SEVERE: Delta Import Failed java.lang.NullPointerException at org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153) at org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Oct 15, 2008 9:50:58 AM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0 Regards Florian -- Media Ventures GmbH Entwicklung Blogmonitor.de Jabber-ID [EMAIL PROTECTED] Telefon +49 (0) 2236 480 10 22
Advice on analysis/filtering?
Hello, group. I'm trying to create a search facility for documents in "broken" Polish (by broken I mean "not language rules compliant"), searchable by terms in "broken" Polish, but broken in many other ways than documents. See this example: document text: "włatcy móch" (in proper Polish this would be "władcy much") example terms that should match: "włatcy much", "wlatcy moch", "wladcy much" This double brokeness ruled out any Polish stemmers currently available for Lucene and now I am at point 0. The search results do not have to be 100% accurate - some missing results are acceptable, but "false positives" are not. Is it at all possible using machinery provided by Solr (I do not own PHD in liguistics), or should I ask the business for lowering their expectations? -- We read Knuth so you don't have to. - Tim Peters Jarek Zgoda, R&D, Redefine [EMAIL PROTECTED]