Re: Boosting of words
Hi, I am using Solr 1.3. I access Solr through carrot and use Java. Regards Bhaskar --- On Thu, 10/15/09, AHMET ARSLAN iori...@yahoo.com wrote: From: AHMET ARSLAN iori...@yahoo.com Subject: Re: Boosting of words To: solr-user@lucene.apache.org Date: Thursday, October 15, 2009, 8:58 AM Hi, I am able to see the results when i pass the values in the query browser. When i pass the below query i am able to see the difference in output. http://localhost:8983/solr/select/?q=java^100%20technology^1 Each time user cannot pass the values in the query browser to see the output. But where exactly java^100 technology^1 this value should be set. In which file and which location to be precise?. Please help me. Althought I do not understand you, you need to URL encode your parameter values before you invoke a HTTP GET. paramater=urlencode(value,UTF-8) Try this url : /select/?q=java%5E100+OR+technology%5E1version=2.2 Note that space is encoded into +. Also ^ is encoded into %5E. What kind of solr client are you using? How are you accessing to solr? From java, php, rubby?
Re: Using DIH's special commands....Help needed
It is strange that LogTransformer did not log the data. . On Fri, Oct 16, 2009 at 5:54 PM, William Pierce evalsi...@hotmail.com wrote: Folks: Continuing my saga with DIH and use of its special commands. I have verified that the script functionality is indeed working. I also verified that '$skipRow' is working. But I don't think that '$deleteDocById' is working. My script now looks as follows: script ![CDATA[ function DeleteRow(row) { var jid = row.get('Id'); var jis = row.get('IndexingStatus'); if ( jis == 4 ) { row.put('$deleteDocById', jid); row.remove('Col1'); row.put('Col1', jid); } return row; } ]] /script The theory is that rows whose 'IndexingStatus' value is 4 should be deleted from solr index. Just to be sure that javascript syntax was correct and checked out, I intentionally overwrite a field called 'Col1' in my schema with primary key of the document to be deleted. On a clean and empty index, I import 47 rows from my dummy db. Everything checks out correctly since IndexingStatus for each row is 1. There are no rows to delete. I then go into the db and set one row with the IndexingStatus = 4. When I execute the dataimport, I find that all 47 documents are imported correctly. However, for the row for which 'IndexingStatus' was set to 4, the Col1 value is set correctly by the script transformer to be the primary key value for that row/document. However, I should not be seeing that document since the '$deleteDocById should have deleted this from solr. Could this be a bug in solr? Or, am I misunderstanding how $deleteDocById works? By the way, Noble, I tried to set the LogTransformer, and add logging per your suggestion. That did not work either. I set logLevel=debug, and also turned on solr logging in the admin console to be the max value (finest) and still no output. Thanks, - Bill -- From: Noble Paul ??? ?? noble.p...@corp.aol.com Sent: Thursday, October 15, 2009 10:05 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed use LogTransformer to see if the value is indeed set entity name=post transformer=script:DeleteRow, RegexTransformer,LogTransformer logTemplate=${post} query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) this should print out the entire row after the transformations On Fri, Oct 16, 2009 at 3:04 AM, William Pierce evalsi...@hotmail.com wrote: Thanks for your reply! I tried your suggestion. No luck. I have verified that I have version 1.6.0_05-b13 of java installed. I am running with the nightly bits of October 7. I am pretty much out of ideas at the present timeI'd appreciate any tips/pointers. Thanks, - Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Thursday, October 15, 2009 1:42 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Fri, Oct 16, 2009 at 12:46 AM, William Pierce evalsi...@hotmail.comwrote: Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong. dataConfig script![CDATA[ function DeleteRow(row) { var jis = row.get('IndexingStatus'); var jid = row.get('Id'); if ( jis == 4 ) { row.put('$deleteDocById', jid); } return row; } ]]/script dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db user=** password=***/ document entity name=post transformer=script:DeleteRow, RegexTransformer query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) field column=ptype splitBy=, sourceColName=a / field column=wauth splitBy=, sourceColName=b / field column=miles splitBy=, sourceColName=c / /entity /document /dataConfig One thing I'd try is to use '4' for comparison rather than the number 4 (the type would depend on the sql type). Also, for javascript transformers to work, you must use JDK 6 which has javascript support.
Re: stats page slow in latest nightly
On Tue, Oct 6, 2009 at 5:51 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : When I was working on it, I was actually going to default to not show : the size, and make you click a link that added a param to get the sizes : in the display too. But I foolishly didn't bring it up when Hoss made my : life easier with his simpler patch. we can always turn the size estimator off ... or turn it only only when doing the insanity checks (so normal stats are fast, buf if anything is duplicated you'll get info on the size of the discrepancy) Is this something we want to do before release? I'm not at all familiar with the new size estimator stuff, so I'm not sure how long it can actually take for a big index. -Yonik http://www.lucidimagination.com
urgent need of some basic help
I m in need of some basic help regarding solr? 1)In which format the posted data will store in SOLR? how the data are stored in solr? 2) what is the concept of replication in SOLR? 3) suppose in my schema.xml i had the format like id,no,name and i had posted nearly 50 documents. Now iu need to post a data of format id,name,address ,, how can i do? i need to index all the posted files again or is there is some other option available?
Store tika extracted result as xhtml
Dear All, I have a field defined in schema.xml as below, fieldtype name=string class=solr.StrField sortMissingLast=true indexed=true stored=true multiValued=false omitNorms=true/ field name=original type=string indexed=false / and in the solrconfig.xml str name=fmap.contentoriginal/str basically, when I upload the document via the command below curl 'http://localhost:8983/solr/info/update/extract?map.content=text_shingleliteral.url=testcommit=true' -F fi...@mccm.pdf and try to display field via a query, it shows Take A Chance On Me Take A Chance On Me Monte Carlo Condensed Matter A very brief guide to Monte Carlo simulation. An explanation of what I do. A chance for far too many ABBA puns ... The above is Not an xhtml(!) However, if I run the command below with extractOnly=true curl 'http://localhost:8983/solr/info/update/extract?map.content=text_shingleliteral.url=testextractOnly=true' -F fi...@mccm.pdf I get the result lt;?xml version=1.0 encoding=UTF-8?gt; lt;html xmlns=http://www.w3.org/1999/xhtmlgt; lt;headgt; lt;titlegt;Take A Chance On Melt;/titlegt; lt;/headgt; lt;bodygt; lt;divgt; . which is an xhtml output. My objective is to be able to stored it as xhtml in the field and be able to retrieve it as cached output. Since tika is already giving xhtml output, I wonder why when Solr save it as a plain text. (Maybe I missed out something in the configuration??) Also, I will be using SolrJ as the application layer, hence as a workaround if there are any ways that I can get the xhtml result, maybe I can stored it somewhere else outside of Solr. Any advice on this will be highly appreciated. Many Thanks Kind Regards Andy
Re: stats page slow in latest nightly
: we can always turn the size estimator off ... or turn it only only when : doing the insanity checks (so normal stats are fast, buf if anything is : duplicated you'll get info on the size of the discrepancy) : : Is this something we want to do before release? I'm not at all : familiar with the new size estimator stuff, so I'm not sure how long : it can actually take for a big index. crap ... this slipped my mind. Yeah, we probably ought to do it before the release. I suspect if you've got things tuned for lots of little segments it could be so slow to be worthless. I won't have access to the code until monday, but i'm pretty sure this should be a fairly trivial change (just un-set the estimator on the CacheEntry objects) -Hoss
Re: urgent need of some basic help
Hi, Naga. On 17-Oct-09, at 10:18 AM, Naga raja wrote: I m in need of some basic help regarding solr? 1)In which format the posted data will store in SOLR? how the data are stored in solr? Once solr has ingested the data, it is stored in binary files in a lucene index. You can see the files in the data/index directory of your solr instance, and you can open that lucene index with something like Luke: http://www.getopt.org/luke/ I find that looking at your index with Luke is a very helpful way of understanding exactly what is being stored. 2) what is the concept of replication in SOLR? Sometimes you want two solr indexes that contain the same data. For example, one common situation is when you want to have one index on a machine where documents are processed and written to an index, (which can be slow) and a separate index that's only used for searching (which you want to be as fast as possible). You could do this by replicating index one to index two. Is that what you're asking about? There are several ways of replicating solr indexes. Make sure you check out these pages on the wiki: http://wiki.apache.org/solr/CollectionDistribution http://wiki.apache.org/solr/SolrReplication 3) suppose in my schema.xml i had the format like id,no,name and i had posted nearly 50 documents. Now iu need to post a data of format id,name,address ,, how can i do? i need to index all the posted files again or is there is some other option available? One concept that people sometimes have trouble with when they start using an index like solr instead of a relational database is that your fields do not all have to be populated for every field. It's totally fine to have some documents that have id, no, name and others that have id, name, and address, but all four potential fields (id, no, name, and address) will have to be accounted for in your schema.xml. Remember to read the wiki, which has pretty much everything you ever need to know about solr: http://wiki.apache.org/solr/ There is also a good solr book available now: http://www.packtpub.com/solr-1-4-enterprise-search-server?utm_source=http%3A%2F%2Flucene.apache.org%2Fsolr%2Futm_medium=sponsutm_content=podutm_campaign=mdb_000275 I hope this helps! Bess Elizabeth (Bess) Sadler Chief Architect for the Online Library Environment Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 smime.p7s Description: S/MIME cryptographic signature
Re: Boosting of words
I am using Solr 1.3. I access Solr through carrot and use Java. What is the meaning of accessing solr through carrot? Are you using solr as an input to carrot? Using org.carrot2.source.solr.SolrDocumentSource just to cluster search results? Can we say that you are interested in clustered search results rather than search results them selfs? If yes solr 1.4 will have Grant Ingersoll's ClusteringComponent [1] which uses carrot2 to cluster search results. [1] http://wiki.apache.org/solr/ClusteringComponent
Re: Using DIH's special commands....Help needed
I had this problem also, but I was using the Jetty exampl. I fail at logging configurations about 90% of the time, so I assumed it was my fault. 2009/10/17 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: It is strange that LogTransformer did not log the data. . On Fri, Oct 16, 2009 at 5:54 PM, William Pierce evalsi...@hotmail.com wrote: Folks: Continuing my saga with DIH and use of its special commands. I have verified that the script functionality is indeed working. I also verified that '$skipRow' is working. But I don't think that '$deleteDocById' is working. My script now looks as follows: script ![CDATA[ function DeleteRow(row) { var jid = row.get('Id'); var jis = row.get('IndexingStatus'); if ( jis == 4 ) { row.put('$deleteDocById', jid); row.remove('Col1'); row.put('Col1', jid); } return row; } ]] /script The theory is that rows whose 'IndexingStatus' value is 4 should be deleted from solr index. Just to be sure that javascript syntax was correct and checked out, I intentionally overwrite a field called 'Col1' in my schema with primary key of the document to be deleted. On a clean and empty index, I import 47 rows from my dummy db. Everything checks out correctly since IndexingStatus for each row is 1. There are no rows to delete. I then go into the db and set one row with the IndexingStatus = 4. When I execute the dataimport, I find that all 47 documents are imported correctly. However, for the row for which 'IndexingStatus' was set to 4, the Col1 value is set correctly by the script transformer to be the primary key value for that row/document. However, I should not be seeing that document since the '$deleteDocById should have deleted this from solr. Could this be a bug in solr? Or, am I misunderstanding how $deleteDocById works? By the way, Noble, I tried to set the LogTransformer, and add logging per your suggestion. That did not work either. I set logLevel=debug, and also turned on solr logging in the admin console to be the max value (finest) and still no output. Thanks, - Bill -- From: Noble Paul ??? ?? noble.p...@corp.aol.com Sent: Thursday, October 15, 2009 10:05 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed use LogTransformer to see if the value is indeed set entity name=post transformer=script:DeleteRow, RegexTransformer,LogTransformer logTemplate=${post} query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) this should print out the entire row after the transformations On Fri, Oct 16, 2009 at 3:04 AM, William Pierce evalsi...@hotmail.com wrote: Thanks for your reply! I tried your suggestion. No luck. I have verified that I have version 1.6.0_05-b13 of java installed. I am running with the nightly bits of October 7. I am pretty much out of ideas at the present timeI'd appreciate any tips/pointers. Thanks, - Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Thursday, October 15, 2009 1:42 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Fri, Oct 16, 2009 at 12:46 AM, William Pierce evalsi...@hotmail.comwrote: Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong. dataConfig script![CDATA[ function DeleteRow(row) { var jis = row.get('IndexingStatus'); var jid = row.get('Id'); if ( jis == 4 ) { row.put('$deleteDocById', jid); } return row; } ]]/script dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db user=** password=***/ document entity name=post transformer=script:DeleteRow, RegexTransformer query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) field column=ptype splitBy=, sourceColName=a / field column=wauth splitBy=, sourceColName=b / field column=miles splitBy=, sourceColName=c / /entity /document /dataConfig
Problem with Query Parser
Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term represion, I found just only the word root, I mean it returns every document with the term repres. Using the admin-debug search I found this: lst name=debug str name=rawquerystringdescription:represion/str str name=querystringdescription:represion/str str name=parsedquerydescription:repres/str str name=parsedquery_toStringdescription:repres/str the ion part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. At schema, description is field name=description type=text indexed=true multiValued=true stored=true/ and text is: fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldtype The only thing that is suspicious to me is the EnglishPorter. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Should I delete also from the index section? What I will loose deleting English porter? Thanks a lot for the help German
Re: Using DIH's special commands....Help needed
postImportDeletQuery is fine in your case. On Sat, Oct 17, 2009 at 3:16 AM, William Pierce evalsi...@hotmail.com wrote: Shalin, Many thanks for your tipBut it did not seem to help! Do you think I can use postDeleteImportQuery for this task? Should I file a bug report? Cheers, Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Friday, October 16, 2009 1:16 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Fri, Oct 16, 2009 at 5:54 PM, William Pierce evalsi...@hotmail.comwrote: Folks: Continuing my saga with DIH and use of its special commands. I have verified that the script functionality is indeed working. I also verified that '$skipRow' is working. But I don't think that '$deleteDocById' is working. My script now looks as follows: script ![CDATA[ function DeleteRow(row) { var jid = row.get('Id'); var jis = row.get('IndexingStatus'); if ( jis == 4 ) { row.put('$deleteDocById', jid); row.remove('Col1'); row.put('Col1', jid); } return row; } ]] /script The theory is that rows whose 'IndexingStatus' value is 4 should be deleted from solr index. Just to be sure that javascript syntax was correct and checked out, I intentionally overwrite a field called 'Col1' in my schema with primary key of the document to be deleted. On a clean and empty index, I import 47 rows from my dummy db. Everything checks out correctly since IndexingStatus for each row is 1. There are no rows to delete. I then go into the db and set one row with the IndexingStatus = 4. When I execute the dataimport, I find that all 47 documents are imported correctly. However, for the row for which 'IndexingStatus' was set to 4, the Col1 value is set correctly by the script transformer to be the primary key value for that row/document. However, I should not be seeing that document since the '$deleteDocById should have deleted this from solr. Could this be a bug in solr? Or, am I misunderstanding how $deleteDocById works? Would the row which has IndexingStatus=4 also create a document with the same uniqueKey which you would delete using the transformer? If yes, that can explain what is happening and you can avoid that by adding a $skipDoc flag in addition to the $deleteDocById flag. I know this is a basic question but you are using Solr 1.4, aren't you? -- Regards, Shalin Shekhar Mangar. -- - Noble Paul | Principal Engineer| AOL | http://aol.com