Re: Scoring for specific field queries
I will try this out. How does 1 and 2 boost the my startswith query? Is it because of the n-gram filter? On Thu, Oct 8, 2009 at 1:29 PM, Avlesh Singh avl...@gmail.com wrote: You would need to boost your startswith matches artificially for the desired behavior. I would do it this way - 1. Create a KeywordTokenized field with n-gram filter. 2. Create a Whitespace tokenized field with n-gram flter. 3. Search on both the fields, boost matches for #1 over #2. Hope this helps. Cheers Avlesh On Thu, Oct 8, 2009 at 10:30 AM, R. Tan tanrihae...@gmail.com wrote: Hi, How can I get wildcard search (e.g. cha*) to score documents based on the position of the keyword in a field? Closer (to the start) means higher score. For example, I have multiple documents with titles containing the word champion. Some of the document titles start with the word champion and some our entitled we are the champions. The ones that starts with the keyword needs to rank first or score higher. Is there a way to do this? I'm using this query for auto-suggest term feature where the keyword doesn't necessarily need to be the first word. Rihaed
Re: Scoring for specific field queries
This might work and I also have a single value field which makes it cleaner. Can sort be customized (with indexOf()) from the solr parameters alone? Thanks! On Thu, Oct 8, 2009 at 1:40 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote: Hi Rihaed, I guess we don't need to depend on scores all the times. You can use custom sort to sort the results. Take a dynamicField, fill it with indexOf(keyword) value, sort the results by the field in ascending order. Then the records which contain the keyword at the earlier position will come first. Regards, Sandeep R. Tan wrote: Hi, How can I get wildcard search (e.g. cha*) to score documents based on the position of the keyword in a field? Closer (to the start) means higher score. -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798657.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scoring for specific field queries
Hi Avlesh, Thanks for your attention to my post. 1. If the word computer occurs in multiple times in a document what would you do in that case? Is this dynamic field supposed to be multivalued? I can't even imagine what would you do if the word computer occurs in multiple documents multiple times? = It doesn't matter how many times a word occurs in that document. Consider its first occurrence and use it for sorting. The dynamic field should not be multivalued. If the keyword occurs at the same position in multiple documents then the document which is inserted first will come first. 2. Multivalued fields cannot be sorted upon. = Yes.. I agree. 3. One needs to know the unique number of such keywords before implementing because you'll potentially end up creating those many fields. = I didn't get this. Why one should know the unique number of keywords before implementation. If we have the logic, it works for all the keywords. Most of the people do the same in case of geographical sorting. They calculate the distance and sort it before displaying it. They don't need to worry about the distance which user requests for. Please tell me your thoughts and correct me if I am wrong. Thanks, Sandeep -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798925.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting by insertion time
Hi, Quite often I want a set of documents ordered by the time they were inserted, i.e. give me the 5 latest items that matches query foo. I usually solve this by sorting on a date field. I had a chat with Eric Hatcher when he visited Javazone 2009 and he said that Solr places documents on disk in insertion order. This would make it possible for me to save a sorting step by not sorting by a specific field, but by insertion time in reverse. AFAIK Lucene knows how to do this, but which request parameters should I use in Solr? Kind regards, Tarjei -- Tarjei Huse Mobil: 920 63 413
Re: Scoring for specific field queries
Yes it can be done but it needs some customization. Search for custom sort implementations/discussions. You can check... http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html. Let us know if you have any issues. Sandeep R. Tan wrote: This might work and I also have a single value field which makes it cleaner. Can sort be customized (with indexOf()) from the solr parameters alone? -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet query pb
clico wrote: That's not a pb I want to use that in order to drill down a tree Christian Zambrano wrote: Clico, Because you are doing a wildcard query, the token 'AMERICA' will not be analyzed at all. This means that 'AMERICA*' will NOT match 'america'. On 10/07/2009 12:30 PM, Avlesh Singh wrote: I have no idea what pb mean but this is what you probably want - fq=(location_field:(NORTH AMERICA*)) Cheers Avlesh On Wed, Oct 7, 2009 at 10:40 PM, clicocl...@mairie-marseille.fr wrote: Hello I have a pb trying to retrieve a tree with facet use I 've got a field location_field Each doc in my index has a location_field Location field can be continent/country/city I have 2 queries: http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29: ok, retrieve docs http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*) : not ok I think with NORTH AMERICA I have a pb with the space caractere Could u help me -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html Sent from the Solr - User mailing list archive at Nabble.com. I'm sorry, this syntax does not work anymore -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25799911.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scoring for specific field queries
I will have to pass on this and try your suggestion first. So, how does your suggestion (1 and 2) boost the my startswith query? Is it because of the n-gram filter? On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote: Yes it can be done but it needs some customization. Search for custom sort implementations/discussions. You can check... http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html . Let us know if you have any issues. Sandeep R. Tan wrote: This might work and I also have a single value field which makes it cleaner. Can sort be customized (with indexOf()) from the solr parameters alone? -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ISOLatin1AccentFilter before or after Snowball?
Now, you got me wondering - wich one should I like better? I didn't even know there is an alternative. :-) Chantal Koji Sekiguchi schrieb: No, ISOLatin1AccentFilterFactory is not deprecated. You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt or ISOLatin1AccentFilterFactory whichever you'd like. Koji Jay Hill wrote: Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory deprecated in favor of: charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ in 1.4? -Jay http://www.lucidimagination.com
Re: how to rename a schema field, whose values are indexed already?
I guess you cant do it. I tried it before. I had a field with name 'KEYWORD' and i changed it to 'keyword' and it didn't work. Everything else was normal and I searched with 'KEYWORD' i got an exception saying undefined field and I searched with 'keyword' , I got 0 results. It didn't work even after optimizing. I re-indexed the data and it worked. Regards, Sandeep M.Noor wrote: how to rename a schema field, if its values are indexed already ?? -- View this message in context: http://www.nabble.com/how-to-rename-a-schema-field%2C-whose-values-are-indexed-already--tp25800631p25801695.html Sent from the Solr - User mailing list archive at Nabble.com.
how to post(index) large file of 5 GB or greater than this
Hi, I am new to solr. I am able to index, search and update with small size(around 500mb) But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception. While investigation I found that post jar or post.sh load whole file in memory. I use one work around with dividing small file in small files..and it's working Is there any other way to post large file as above work around is not feasible for 1 TB file Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: how to rename a schema field, whose values are indexed already?
Without re-indexing the data, how i rename, any one of the schema field ?? Sandeep Tagore wrote: I guess you cant do it. I tried it before. I had a field with name 'KEYWORD' and i changed it to 'keyword' and it didn't work. Everything else was normal and I searched with 'KEYWORD' i got an exception saying undefined field and I searched with 'keyword' , I got 0 results. It didn't work even after optimizing. I re-indexed the data and it worked. Regards, Sandeep M.Noor wrote: how to rename a schema field, if its values are indexed already ??
Re: how to rename a schema field, whose values are indexed already?
Without re-indexing the data, how to rename, any one of the schema field ?? Sandeep Tagore wrote: I guess you cant do it. I tried it before. I had a field with name 'KEYWORD' and i changed it to 'keyword' and it didn't work. Everything else was normal and I searched with 'KEYWORD' i got an exception saying undefined field and I searched with 'keyword' , I got 0 results. It didn't work even after optimizing. I re-indexed the data and it worked. Regards, Sandeep M.Noor wrote: how to rename a schema field, if its values are indexed already ??
Re: Ranking of search results
Hi Amith, I tried with the options you gave and gave debug=true at the end of the URL. I am getting output as lst name=debug str name=rawquerystringchannel/str str name=querystringchannel/str str name=parsedquerytext:channel/str str name=parsedquery_toStringtext:channel/str - lst name=explain str name=http://hotmail;1.2682627 = (MATCH) fieldWeight(text:channel in 3), product of: 2.828427 = tf(termFreq(text:channel)=8) 2.049822 = idf(docFreq=6, numDocs=20) 0.21875 = fieldNorm(field=text, doc=3)/str str name=http://share;1.0026497 = (MATCH) fieldWeight(text:channel in 19), product of: 2.236068 = tf(termFreq(text:channel)=5) 2.049822 = idf(docFreq=6, numDocs=20) 0.21875 = fieldNorm(field=text, doc=19)/str str name=http://metacreek;0.6341314 = (MATCH) fieldWeight(text:channel in 10), product of: 1.4142135 = tf(termFreq(text:channel)=2) 2.049822 = idf(docFreq=6, numDocs=20) 0.21875 = fieldNorm(field=text, doc=10)/str str name=http://yahoo;0.5124555 = (MATCH) fieldWeight(text:channel in 0), product of: 1.0 = tf(termFreq(text:channel)=1) 2.049822 = idf(docFreq=6, numDocs=20) 0.25 = fieldNorm(field=text, doc=0)/str str name=http://sharemarket;0.4483986 = (MATCH) fieldWeight(text:channel in 1), product of: 1.0 = tf(termFreq(text:channel)=1) 2.049822 = idf(docFreq=6, numDocs=20) 0.21875 = fieldNorm(field=text, doc=1)/str str name=http://Altavista;0.4483986 = (MATCH) fieldWeight(text:channel in 5), product of: 1.0 = tf(termFreq(text:channel)=1) 2.049822 = idf(docFreq=6, numDocs=20) 0.21875 = fieldNorm(field=text, doc=5)/str /lst What does the numeric terms denotes?.With this numeric value will i be able to i set preference for my search links?.If so how?. Regards Bhaskar - On Thu, 10/1/09, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: From: bhaskar chandrasekar bas_s...@yahoo.co.in Subject: Re: Ranking of search results To: solr-user@lucene.apache.org Date: Thursday, October 1, 2009, 7:34 PM --- On Wed, 9/23/09, Amit Nithian anith...@gmail.com wrote: Hi Amith, Thanks for your reply.How do i set preference for the links , which should appear first,second in the search results. Which configuration file in Solr needs to be modified to achieve the same?. Regards Bhaskar From: Amit Nithian anith...@gmail.com Subject: Re: Ranking of search results To: solr-user@lucene.apache.org Date: Wednesday, September 23, 2009, 11:33 AM It depends on several things:1) The query handler that you are using 2) The fields that you are searching on and default fields specified For the default handler, it will issue a query for the default field and return results accordingly. To see what is going on pass the debugQuery=true to the end of the URL to see detailed output. If you are using the DisMaxHandler (DisJoint Max) then you will have a qf, pf and bf (query fields, phrase fields, boosting function). I would start looking at http://wiki.apache.org/solr/DisMaxRequestHandler http://wiki.apache.org/solr/DisMaxRequestHandler- Amit On Wed, Sep 23, 2009 at 10:25 AM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, When i give a input string for search in Solr , it displays me the corresponding results for the given input string. How the results are ranked and displayed.On what basis the search results are displayed. Is there any algorithm followed for displaying the results with first result and so on. Regards Bhaskar
Re: Scoring for specific field queries
Hi Avlesh, I can't seem to get the scores right. I now have these types for the fields I'm targeting, fieldType name=autoComplete class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=autoComplete2 class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType My query is this, q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0 What should I tweak from the above config and query? Thanks, Rih On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote: I will have to pass on this and try your suggestion first. So, how does your suggestion (1 and 2) boost the my startswith query? Is it because of the n-gram filter? On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote: Yes it can be done but it needs some customization. Search for custom sort implementations/discussions. You can check... http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html . Let us know if you have any issues. Sandeep R. Tan wrote: This might work and I also have a single value field which makes it cleaner. Can sort be customized (with indexOf()) from the solr parameters alone? -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scoring for specific field queries
Filters? I did not mean filters at all. I am in a mad rush right now, but on the face of it your field definitions look right. This is what I asked for - q=(autoComplete2:cha^10 autoComplete:cha) Lemme know if this does not work for you. Cheers Avlesh On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote: Hi Avlesh, I can't seem to get the scores right. I now have these types for the fields I'm targeting, fieldType name=autoComplete class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=autoComplete2 class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType My query is this, q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0 What should I tweak from the above config and query? Thanks, Rih On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote: I will have to pass on this and try your suggestion first. So, how does your suggestion (1 and 2) boost the my startswith query? Is it because of the n-gram filter? On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Yes it can be done but it needs some customization. Search for custom sort implementations/discussions. You can check... http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html . Let us know if you have any issues. Sandeep R. Tan wrote: This might work and I also have a single value field which makes it cleaner. Can sort be customized (with indexOf()) from the solr parameters alone? -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ISOLatin1AccentFilter before or after Snowball?
In this particular case, I don't think one is better than the other... In general, MappingCharFilter is more flexible than specific TokenFilters, such as ISOLatin1AccentFilter. For example, if you want your own character mapping rules, you can add them to mapping.txt. It should be easier than modifing TokenFilters as you don't need programming. Koji Chantal Ackermann wrote: Now, you got me wondering - wich one should I like better? I didn't even know there is an alternative. :-) Chantal Koji Sekiguchi schrieb: No, ISOLatin1AccentFilterFactory is not deprecated. You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt or ISOLatin1AccentFilterFactory whichever you'd like. Koji Jay Hill wrote: Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory deprecated in favor of: charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ in 1.4? -Jay http://www.lucidimagination.com
Re: how to rename a schema field, whose values are indexed already?
On Thu, Oct 8, 2009 at 4:32 PM, noor noo...@opentechindia.com wrote: Without re-indexing the data, how to rename, any one of the schema field ?? Solr does not support renaming without re-indexing. Re-indexing is your best bet. If you cannot re-index for some reasons and if you have all fields as stored then you can write a program to read all documents and write new ones with the same values. -- Regards, Shalin Shekhar Mangar.
Re: solr reporting tool adapter
Hi Lance, thnx a tonwill look into BIRT Regards, Raakhi On Thu, Oct 8, 2009 at 1:22 AM, Lance Norskog goks...@gmail.com wrote: The BIRT project can do what you want. It has a nice form creator and you can configure http XML input formats. It includes very complete Eclipse plugins and there is a book about it. On 10/7/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Oct 7, 2009 at 2:51 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: we basically wanna generate PDF reports which contain, tag clouds, bar charts, pie charts etc. Faceting on a field will give you top terms and frequency information which can be used to create tag clouds. What do you want to plot on a bar chart? I don't know of a reporting tool which can hook into Solr for creating such things. -- Regards, Shalin Shekhar Mangar. -- Lance Norskog goks...@gmail.com
Re: Facet query pb
clico wrote: clico wrote: That's not a pb I want to use that in order to drill down a tree Christian Zambrano wrote: Clico, Because you are doing a wildcard query, the token 'AMERICA' will not be analyzed at all. This means that 'AMERICA*' will NOT match 'america'. On 10/07/2009 12:30 PM, Avlesh Singh wrote: I have no idea what pb mean but this is what you probably want - fq=(location_field:(NORTH AMERICA*)) Cheers Avlesh On Wed, Oct 7, 2009 at 10:40 PM, clicocl...@mairie-marseille.fr wrote: Hello I have a pb trying to retrieve a tree with facet use I 've got a field location_field Each doc in my index has a location_field Location field can be continent/country/city I have 2 queries: http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29: ok, retrieve docs http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*) : not ok I think with NORTH AMERICA I have a pb with the space caractere Could u help me -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html Sent from the Solr - User mailing list archive at Nabble.com. I'm sorry, this syntax does not work anymore When I try a debug mode here is the result arr name=parsed_filter_queries str+location_field:NORTH+location_field:AMERICA*/str /arr My location_field is a type String, containing NORTH AMERICA/NY/NYC Thanks for helping me -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25802964.html Sent from the Solr - User mailing list archive at Nabble.com.
issue in adding data to a multivalued field
Hi, i have a small schema with some of the fields defined as: field name=id type=string indexed=true stored=true multiValued=false required=true/ field name=content type=text indexed=true stored=true multivalued=false / field name=author_name type=text indexed=true stored=false multivalued=true/ where the field author_name is multivalued. however in UI (schema browser), following r the details of author_name field, its nowhere mentioned tht its multivalued. Field: author_name Field Type: text Properties: Indexed, Tokenized when i try creating and adding a document into solr, i get an exception ERROR_id1_multiple_values_encountered_for_non_multiValued_field_author_name_ninad_raakhi_goureya_sheetal here's my code snippet: solrDoc17.addField(id, id1); solrDoc17.addField(content, SOLR); solrDoc17.addField(author_name, ninad); solrDoc17.addField(author_name, raakhi); solrDoc17.addField(author_name, goureya); solrDoc17.addField(author_name, sheetal); server.add(solrDoc17); server.commit(); ny pointers?? regards, Raakhi
Re: How to retrieve the index of a string within a field?
Sandeep, When I submit query, i actually make sure the searched phrase is wrapped with double quotes. When I do that, it will only return sentences with 'get what you'. If it does not have double quotes, it will return all the sentences as described in your email because without double quotes, it is a 'get OR what OR you' query. I don't know too much about the concepts behind search. I just make use of whatever works for me. Do you think I am still ok using text as my sentence field type? If the return is 100 thousands of results, will Solrj's http call hung up on it? Thanks a lot. Elaine On Thu, Oct 8, 2009 at 1:31 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Elaine, The field type text contains tokenizer class=solr.WhitespaceTokenizerFactory/ in its definition. So all the sentences that are indexed / queried will be split in to words. So when you search for 'get what you', you will get sentences containing get, what, you, get what, get you, what you, get what you. So when you try to find the indexOf of the keyword in that sentence (from results), you may not get it all the times. Solrj can give the results in one shot but it uses http call. You cant avoid it. You don't need to query multiple times with Solrj. Query once, get the results, store them in java beans, process it and display the results. Regards, Sandeep Elaine Li wrote: Sandeep, I do get results when I search for get what you, not 0 results. What in my schema makes this difference? I need to learn Solrj. I am currently using javascript as a client and invoke http calls to get results to display in the browser. Can Solrj get all the results at one short w/o the http call? I need to do some postprocessing against all the results and then display the processed data. Submitting multiple http queries and post-process after each query does not seem to be the right way. -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to post(index) large file of 5 GB or greater than this
You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar *.xml Or i split the file if it is too big. Elaine On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne pravin_ka...@persistent.co.in wrote: Hi, I am new to solr. I am able to index, search and update with small size(around 500mb) But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception. While investigation I found that post jar or post.sh load whole file in memory. I use one work around with dividing small file in small files..and it's working Is there any other way to post large file as above work around is not feasible for 1 TB file Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: how to post(index) large file of 5 GB or greater than this
you can write a simple program which streams the file from the disk to post it to Solr On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li elaine.bing...@gmail.com wrote: You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar *.xml Or i split the file if it is too big. Elaine On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne pravin_ka...@persistent.co.in wrote: Hi, I am new to solr. I am able to index, search and update with small size(around 500mb) But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception. While investigation I found that post jar or post.sh load whole file in memory. I use one work around with dividing small file in small files..and it's working Is there any other way to post large file as above work around is not feasible for 1 TB file Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: how to post(index) large file of 5 GB or greater than this
Are you are indexing multiple documents? If so, split them into multiple files. A single XML file with all documents is not a good idea. Solr is designed to use batches for indexing. It will be extremely hard to index a 1TB XML file. I would guess that would need a JVM heap of well over 1TB. wunder On Oct 8, 2009, at 6:56 AM, Noble Paul നോബിള് नोब्ळ् wrote: you can write a simple program which streams the file from the disk to post it to Solr On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li elaine.bing...@gmail.com wrote: You can increase the java heap size, e.g. java -Xms128m -Xmx8192m - jar *.xml Or i split the file if it is too big. Elaine On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne pravin_ka...@persistent.co.in wrote: Hi, I am new to solr. I am able to index, search and update with small size(around 500mb) But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception. While investigation I found that post jar or post.sh load whole file in memory. I use one work around with dividing small file in small files..and it's working Is there any other way to post large file as above work around is not feasible for 1 TB file Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
correct syntax for boolean search
Hi, What is the correct syntax for the following boolean search from a field? fieldname1:(word_a1 or word_b1) (word_a2 or word_b2) (word_a3 or word_b3) fieldname2:. Thanks. Elaine
Re: Default query parameter for one core
On Wed, Oct 7, 2009 at 1:46 PM, Michael solrco...@gmail.com wrote: Is there a way to not have the shards param at all for most cores, and for core0 to specify it? E.g. core0 requests always get a shards=foo appended, while other cores don't have an shards param at all. Or, barring that, is there a way to tell one core use this chunk of XML for your defaults tag, and tell the other cores use this other chunk of XML for your defaults tag?
Re: how to post(index) large file of 5 GB or greater than this
What is this huge file? Solr XML? CSV? Anyway, if it's a local file, you can get Solr to directly read/stream it via stream.file Examples in http://wiki.apache.org/solr/UpdateCSV but it should work for any update format, not just CSV. -Yonik http://www.lucidimagination.com On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne pravin_ka...@persistent.co.in wrote: Hi, I am new to solr. I am able to index, search and update with small size(around 500mb) But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception. While investigation I found that post jar or post.sh load whole file in memory. I use one work around with dividing small file in small files..and it's working Is there any other way to post large file as above work around is not feasible for 1 TB file Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: how can I use debugQuery if I have extended QParserPlugin?
I did check the other posts, as well as whatever I could find on the net but didnt find anything. Has anyone encountered this type of issue, or is what I am doing (extending QParserPlugin) that unusual?? gdeconto wrote: ... one thing I noticed is that if I append debugQuery=true to a query that includes the virtual function, I get a NullPointerException, likely because the debugging code looks at the query passed in and not the expanded query that my code generates and that gets used by solr for retrieving data. ... -- View this message in context: http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25803277.html Sent from the Solr - User mailing list archive at Nabble.com.
UTF-8 and latin accents
Hello list, I'm trying to index documents with latin accents (italian documents). I extract the text from .doc documents with Tika directly into .xml files. If i open up the XML document with my Dashcode (i run mac os x) i can see the characters correctly. my xml document is an xml document with the ?xml version=1.0 encoding=UTF-8? adddoc ... headers. When i search and retrieve documents in solr the accented characters are replaced by an '?'. What is the problem? I guess the problem could be in (1) the schema (2) the xml document file coding itself (i don't see the characters correctly if i open it up with vim in terminal). Any suggestions? thanks -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
Re: UTF-8 and latin accents
On Thu, Oct 8, 2009 at 12:48 PM, Claudio Martella claudio.marte...@tis.bz.it wrote: I'm trying to index documents with latin accents (italian documents). I extract the text from .doc documents with Tika directly into .xml files. If i open up the XML document with my Dashcode (i run mac os x) i can see the characters correctly. my xml document is an xml document with the ?xml version=1.0 encoding=UTF-8? adddoc ... headers. Maybe those documents aren't actually in UTF8. Why don't you try Solr's example/exampledocs/utf8-example.xml When i search and retrieve documents in solr the accented characters are replaced by an '?'. What is the problem? I guess the problem could be in (1) the schema (2) the xml document file coding itself (i don't see the characters correctly if i open it up with vim in terminal). in vim/gvim try :set encoding=utf8 -Yonik http://www.lucidimagination.com
Re: ISOLatin1AccentFilter before or after Snowball?
Hello, i'm following the thread but i think it still hasn't been answered if the isolatinfilter goes before or after the stemmer. any direct answer? Koji Sekiguchi wrote: In this particular case, I don't think one is better than the other... In general, MappingCharFilter is more flexible than specific TokenFilters, such as ISOLatin1AccentFilter. For example, if you want your own character mapping rules, you can add them to mapping.txt. It should be easier than modifing TokenFilters as you don't need programming. Koji Chantal Ackermann wrote: Now, you got me wondering - wich one should I like better? I didn't even know there is an alternative. :-) Chantal Koji Sekiguchi schrieb: No, ISOLatin1AccentFilterFactory is not deprecated. You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt or ISOLatin1AccentFilterFactory whichever you'd like. Koji Jay Hill wrote: Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory deprecated in favor of: charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ in 1.4? -Jay http://www.lucidimagination.com -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
Re: correct syntax for boolean search
q=+fieldname1:(+(word_a1 word_b1) +(word_a2 word_b2) +(word_a3 word_b3)) +fieldname2:... Cheers Avlesh On Thu, Oct 8, 2009 at 7:40 PM, Elaine Li elaine.bing...@gmail.com wrote: Hi, What is the correct syntax for the following boolean search from a field? fieldname1:(word_a1 or word_b1) (word_a2 or word_b2) (word_a3 or word_b3) fieldname2:. Thanks. Elaine
Re: how can I use debugQuery if I have extended QParserPlugin?
On Thu, Oct 8, 2009 at 12:14 PM, gdeconto gerald.deco...@topproducer.com wrote: I did check the other posts, as well as whatever I could find on the net but didnt find anything. Has anyone encountered this type of issue, or is what I am doing (extending QParserPlugin) that unusual?? I think you need to provide some more information such as a stack trace for the NPE, or a more elaborate description of what you think the problem is with the debug component. You said because the debugging code looks at the query passed in and not the expanded query, but I don't understand that. The debug component is passed the actual Query object that the QParserPlugin created. -Yonik http://www.lucidimagination.com gdeconto wrote: ... one thing I noticed is that if I append debugQuery=true to a query that includes the virtual function, I get a NullPointerException, likely because the debugging code looks at the query passed in and not the expanded query that my code generates and that gets used by solr for retrieving data. ... -- View this message in context: http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25803277.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: IndexWriter InfoStream in solrconfig not working
I can't get it to work either, so I reopened https://issues.apache.org/jira/browse/SOLR-1145 -Yonik http://www.lucidimagination.com On Wed, Oct 7, 2009 at 1:45 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I had the same problem. I'd be very interested to know how to get this working... -Gio. -Original Message- From: Burton-West, Tom [mailto:tburt...@umich.edu] Sent: Wednesday, October 07, 2009 12:13 PM To: solr-user@lucene.apache.org Subject: IndexWriter InfoStream in solrconfig not working Hello, We are trying to debug an indexing/optimizing problem and have tried setting the infoStream file in solrconf.xml so that the SolrIndexWriter will write a log file. Here is our setting: !-- To aid in advanced debugging, you may turn on IndexWriter debug logging. Uncommenting this and setting to true will set the file that the underlying Lucene IndexWriter will write its debug infostream to. -- infoStream file=/tmp/LuceneIndexWriterDebug.logtrue/infoStream After making that change to solrconfig.xml, restarting Solr, we see a message in the tomcat logs saying that the log is enabled: build-2_log.2009-10-06.txt:INFO: IndexWriter infoStream debug log is enabled: /tmp/LuceneIndexWriterDebug.log However, if we then run an optimize we can't see any log file being written. I also looked at the patch for http://issues.apache.org/jira/browse/SOLR-1145, but did not see a unit test that I might try to run in our system. Do others have this logging working successfully ? Is there something else that needs to be set up? Tom
Re: IndexWriter InfoStream in solrconfig not working
OK, move the infoStream part in solrconfig.xml from indexDefaults into mainIndex and it should work. -Yonik http://www.lucidimagination.com On Thu, Oct 8, 2009 at 2:40 PM, Yonik Seeley yonik.see...@lucidimagination.com wrote: I can't get it to work either, so I reopened https://issues.apache.org/jira/browse/SOLR-1145 -Yonik http://www.lucidimagination.com On Wed, Oct 7, 2009 at 1:45 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I had the same problem. I'd be very interested to know how to get this working... -Gio. -Original Message- From: Burton-West, Tom [mailto:tburt...@umich.edu] Sent: Wednesday, October 07, 2009 12:13 PM To: solr-user@lucene.apache.org Subject: IndexWriter InfoStream in solrconfig not working Hello, We are trying to debug an indexing/optimizing problem and have tried setting the infoStream file in solrconf.xml so that the SolrIndexWriter will write a log file. Here is our setting: !-- To aid in advanced debugging, you may turn on IndexWriter debug logging. Uncommenting this and setting to true will set the file that the underlying Lucene IndexWriter will write its debug infostream to. -- infoStream file=/tmp/LuceneIndexWriterDebug.logtrue/infoStream After making that change to solrconfig.xml, restarting Solr, we see a message in the tomcat logs saying that the log is enabled: build-2_log.2009-10-06.txt:INFO: IndexWriter infoStream debug log is enabled: /tmp/LuceneIndexWriterDebug.log However, if we then run an optimize we can't see any log file being written. I also looked at the patch for http://issues.apache.org/jira/browse/SOLR-1145, but did not see a unit test that I might try to run in our system. Do others have this logging working successfully ? Is there something else that needs to be set up? Tom
releasing memory?
Hello- I have an application that can run in the background on a user Desktop -- it will go through phases of being used and not being used. I want to be able to free as many system resources when not in use as possible. Currently I have a timer that wants for 10 mins of inactivity and releases a bunch of memory (unrelated to lucene/solor). Any suggestion on the best way to do this in lucene/solor? perhaps reload a core? thanks for any pointers ryan
Re: Scoring for specific field queries
Hmm... I don't quite get the desired results. Those starting with cha are now randomly ordered. Is there something wrong with the filters I applied? On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com wrote: Filters? I did not mean filters at all. I am in a mad rush right now, but on the face of it your field definitions look right. This is what I asked for - q=(autoComplete2:cha^10 autoComplete:cha) Lemme know if this does not work for you. Cheers Avlesh On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote: Hi Avlesh, I can't seem to get the scores right. I now have these types for the fields I'm targeting, fieldType name=autoComplete class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=autoComplete2 class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType My query is this, q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0 What should I tweak from the above config and query? Thanks, Rih On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote: I will have to pass on this and try your suggestion first. So, how does your suggestion (1 and 2) boost the my startswith query? Is it because of the n-gram filter? On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Yes it can be done but it needs some customization. Search for custom sort implementations/discussions. You can check... http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html . Let us know if you have any issues. Sandeep R. Tan wrote: This might work and I also have a single value field which makes it cleaner. Can sort be customized (with indexOf()) from the solr parameters alone? -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delay while adding document to solr index
On Thu, Oct 8, 2009 at 1:58 AM, swapna_here swapna.here...@gmail.com wrote: i don't understand why my solr index increasing daily when i am adding and deleting the same number of documents daily A delete is just a bit flip, and does not reclaim disk space immediately. Deleted documents are squeezed out when segment merges happen (including an optimize which merges all segments). If you have large segments that documents are deleted from, those segments may not be involved in a merge and hence the deleted docs can hang around for quite some time. -Yonik http://www.lucidimagination.com i run org.apache.solr.client.solrj.SolrServer.optimize() manually four times a day is it not the right way to run optimize, if yes what is the procedure to run optimize? thanks in advance :) -- View this message in context: http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25798789.html Sent from the Solr - User mailing list archive at Nabble.com.
indexing frequently-changing fields
I am using Solr to index data in a SQL database. Most of the data doesn't change after initial commit, except for a single boolean field that indicates whether an item is flagged as 'needing attention'. So I have a need_attention field in the database that I update whenever a user marks an item as needing attention in my UI. The problem I have is that I want to offer the ability to include need_attention in my user's queries, but do not want to incur the expense of having to reindex whenever this flag changes on an individual document. I have thought about different solutions to this problem, including using multi-core and having a smaller core for recently-marked items that I am willing to do 'near-real-time' commits on. Are there are any common solutions to this problem, which I have to imagine is common in this community?
Re: indexing frequently-changing fields
It's a bit round-about but you might be able to use ExternalFileField http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html The fieldType definition would look like fieldType name=file keyField=id defVal=1 stored=false indexed=false class=solr.ExternalFileField valType=float/ Then you can use frange to include/exclude certain values: http://www.lucidimagination.com/blog/tag/frange/ -Yonik http://www.lucidimagination.com On Thu, Oct 8, 2009 at 4:59 PM, didier deshommes dfdes...@gmail.com wrote: I am using Solr to index data in a SQL database. Most of the data doesn't change after initial commit, except for a single boolean field that indicates whether an item is flagged as 'needing attention'. So I have a need_attention field in the database that I update whenever a user marks an item as needing attention in my UI. The problem I have is that I want to offer the ability to include need_attention in my user's queries, but do not want to incur the expense of having to reindex whenever this flag changes on an individual document. I have thought about different solutions to this problem, including using multi-core and having a smaller core for recently-marked items that I am willing to do 'near-real-time' commits on. Are there are any common solutions to this problem, which I have to imagine is common in this community?
RE: Problems with WordDelimiterFilterFactory
Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
RE: Problems with WordDelimiterFilterFactory
Sorry, the last line was truncated - HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Friday, 9 October 2009 8:22 AM To: 'solr-user@lucene.apache.org' Subject: RE: Problems with WordDelimiterFilterFactory Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any
Re: Problems with WordDelimiterFilterFactory
Hi Bern, the problem is the character sequence --. A query is not allowed to have minus characters that consequent upon another one. Remove one minus character and the query will be parsed without problems. Because of this parsing problem, I'd recommend a query cleanup before the submit to the Solr server that replaces each sequence of minus characters by a single one. Regards, Patrick Bernadette Houghton schrieb: Sorry, the last line was truncated - HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Friday, 9 October 2009 8:22 AM To: 'solr-user@lucene.apache.org' Subject: RE: Problems with WordDelimiterFilterFactory Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email:
RE: Sorting by insertion time
Hi Tarjei, See https://issues.apache.org/jira/browse/SOLR-1478 - with trunk Solr (and soon, 1.4), you can use pseudo-field _docid_ for this purpose. Steve -Original Message- From: tarjei [mailto:tar...@nu.no] Sent: Thursday, October 08, 2009 2:18 AM To: solr-user@lucene.apache.org Subject: Sorting by insertion time Hi, Quite often I want a set of documents ordered by the time they were inserted, i.e. give me the 5 latest items that matches query foo. I usually solve this by sorting on a date field. I had a chat with Eric Hatcher when he visited Javazone 2009 and he said that Solr places documents on disk in insertion order. This would make it possible for me to save a sorting step by not sorting by a specific field, but by insertion time in reverse. AFAIK Lucene knows how to do this, but which request parameters should I use in Solr? Kind regards, Tarjei -- Tarjei Huse Mobil: 920 63 413
RE: Problems with WordDelimiterFilterFactory
Thanks for this, marklo; it is a *very* useful page. bern -Original Message- From: marklo [mailto:mar...@pcmall.com] Sent: Thursday, 8 October 2009 1:10 PM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Use http://solr-url/solr/admin/analysis.jsp to see how your data is indexed/queried -- View this message in context: http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25797377.html Sent from the Solr - User mailing list archive at Nabble.com.
[slightly off topic] Jetty and NIO
In the Solr example jetty.xml, there is the following setup and comments: !-- Use this connector for many frequently idle connections and for threadless continuations. Call name=addConnector Arg New class=org.mortbay.jetty.nio.SelectChannelConnector Set name=portSystemProperty name=jetty.port default=8983//Set Set name=maxIdleTime3/Set Set name=Acceptors2/Set Set name=confidentialPort8443/Set /New /Arg /Call -- !-- Use this connector if NIO is not available. -- !-- This connector is currently being used for Solr because the nio.SelectChannelConnector showed poor performance under WindowsXP from a single client with non-persistent connections (35s vs ~3min) to complete 10,000 requests) -- Call name=addConnector Arg New class=org.mortbay.jetty.bio.SocketConnector Set name=portSystemProperty name=jetty.port default=8983//Set Set name=maxIdleTime5/Set Set name=lowResourceMaxIdleTime1500/Set /New /Arg /Call So, if I'm on Centos 2.6 (64 bit), what connector should I be using? Based on the comments, I'm not sure the top one is the right thing either, but it also sounds like it is my only other choice. The other thing I'm noticing is if I profile my app and I am retrieving something like 50 rows at a time, 30-60% of the time is spent in org.mortbay.jetty.bio.SocketConnector$Connection.fill(). I realize the answer may just be to get less results, but I was wondering if there are other tuning parameters that can make this more efficient b/c the 50 rows thing is a biz. reqt and I may not be able to get that changed. Thanks, Grant
RE: Problems with WordDelimiterFilterFactory
Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error, but still doesn't find the right record. I see from marklo's analysis page that solr is still parsing it with a hyphen. Changing this part of our schema.xml - filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / To filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / i.e. replacing non-alpha chars with a space, looks like it may handle that aspect. Regards Bern -Original Message- From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com] Sent: Friday, 9 October 2009 9:03 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Hi Bern, the problem is the character sequence --. A query is not allowed to have minus characters that consequent upon another one. Remove one minus character and the query will be parsed without problems. Because of this parsing problem, I'd recommend a query cleanup before the submit to the Solr server that replaces each sequence of minus characters by a single one. Regards, Patrick Bernadette Houghton schrieb: Sorry, the last line was truncated - HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Friday, 9 October 2009 8:22 AM To: 'solr-user@lucene.apache.org' Subject: RE: Problems with WordDelimiterFilterFactory Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/
Re: [slightly off topic] Jetty and NIO
On Thu, Oct 8, 2009 at 6:24 PM, Grant Ingersoll gsing...@apache.org wrote: So, if I'm on Centos 2.6 (64 bit), what connector should I be using? Based on the comments, I'm not sure the top one is the right thing either, but it also sounds like it is my only other choice. Right - the connector that Solr uses in the example is fine for typical Solr uses - NIO won't help. The other thing I'm noticing is if I profile my app and I am retrieving something like 50 rows at a time, 30-60% of the time is spent in org.mortbay.jetty.bio.SocketConnector$Connection.fill(). On the Solr server side? That's code that *reads* a request from the client... so if a lot of time is being spent there, it's probably blocking waiting for the rest of the request? The tests could be network bound, or the test client may not be fast enough? If we are saturating the network connection, then use SolrJ if you're not, w/ the binary response format, or use something like JSON format otherwise. If you end up using a text response format, you could try enabling compression for responses (not sure how with jetty). -Yonik http://www.lucidimagination.com I realize the answer may just be to get less results, but I was wondering if there are other tuning parameters that can make this more efficient b/c the 50 rows thing is a biz. reqt and I may not be able to get that changed. Thanks, Grant
Re: how can I use debugQuery if I have extended QParserPlugin?
Hi Yonik; My original post ( http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tt25789546.html http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tt25789546.html ) has the stack trace. =^D I am having trouble reproducing this issue consistently (I sometimes dont get the NPE) so will have to track this down a bit more. Luckily, someone just showed me how to debug the core solr files with Eclipse. Hopefully I can now figure this out on my own. thx Yonik Seeley-2 wrote: I think you need to provide some more information such as a stack trace for the NPE, or a more elaborate description of what you think the problem is with the debug component. You said because the debugging code looks at the query passed in and not the expanded query, but I don't understand that. The debug component is passed the actual Query object that the QParserPlugin created. -- View this message in context: http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25812899.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems with WordDelimiterFilterFactory
Bern, The only way that could be happening is if you are not using the field type you described on your original e-mail. The TokenFilter WordDelimiterFilterFactory should take care of the hyphen. On 10/08/2009 05:30 PM, Bernadette Houghton wrote: Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error, but still doesn't find the right record. I see from marklo's analysis page that solr is still parsing it with a hyphen. Changing this part of our schema.xml - filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / To filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / i.e. replacing non-alpha chars with a space, looks like it may handle that aspect. Regards Bern -Original Message- From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com] Sent: Friday, 9 October 2009 9:03 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Hi Bern, the problem is the character sequence --. A query is not allowed to have minus characters that consequent upon another one. Remove one minus character and the query will be parsed without problems. Because of this parsing problem, I'd recommend a query cleanup before the submit to the Solr server that replaces each sequence of minus characters by a single one. Regards, Patrick Bernadette Houghton schrieb: Sorry, the last line was truncated - HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 7. Was expecting one of: ( ... * ...QUOTED ...TERM ...PREFIXTERM ...WILDTERM ... [ ... { ...NUMBER ... -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Friday, 9 October 2009 8:22 AM To: 'solr-user@lucene.apache.org' Subject: RE: Problems with WordDelimiterFilterFactory Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghtonbernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer
concatenating tokens
hello *, im using a combination of tokenizers and filters that give me the desired tokens, however for a particular field i want to concatenate these tokens back to a single string, is there a filter to do that, if not what are the steps needed to make my own filter to concatenate tokens? for example, i start with Sprocket (widget) - Blue the analyzers churn out the tokens [sprocket,widget,blue] i want to end up with the string sprocket widget blue, this is a simple example and in the general case lowercasing and punctuation removal does not work, hence why im looking to concatenate tokens --joe
Re: [slightly off topic] Jetty and NIO
On Oct 8, 2009, at 7:37 PM, Yonik Seeley wrote: On Thu, Oct 8, 2009 at 6:24 PM, Grant Ingersoll gsing...@apache.org wrote: So, if I'm on Centos 2.6 (64 bit), what connector should I be using? Based on the comments, I'm not sure the top one is the right thing either, but it also sounds like it is my only other choice. Right - the connector that Solr uses in the example is fine for typical Solr uses - NIO won't help. The other thing I'm noticing is if I profile my app and I am retrieving something like 50 rows at a time, 30-60% of the time is spent in org.mortbay.jetty.bio.SocketConnector$Connection.fill(). On the Solr server side? Yes. That's code that *reads* a request from the client... If I change nothing else and set rows=10, the time spent in .fill() is 5%. I'll double check everything on my end. so if a lot of time is being spent there, it's probably blocking waiting for the rest of the request? The tests could be network bound, or the test client may not be fast enough? If we are saturating the network connection, then use SolrJ if you're not, w/ the binary response format, or use something like JSON format otherwise. If you end up using a text response format, you could try enabling compression for responses (not sure how with jetty).
multi-word synonyms and analysis.jsp vs real field analysis (query, index)
Hi list, I worked on a field type and its analyzing chain, at which I want to use the SynonymFilter with entries similar to: foo bar=foo_bar During the analysis phase, I used the /admin/analysis.jsp view to test the analyzing results produced by the created field type. The output shows that a query foo bar will first be separated by the WhitespaceTokenizer to the two tokens foo and bar, and that the SynonymFilter will replace the both tokens with foo_bar. But as I tried this at real query time with the request handler standard and also with dismax, the tokens foo and bar were not replaced. The parsedQueryString was something similar to field:foo field:bar. At index time, it works like expected. Has anybody experienced this and/or knows a workaround, a solution for it? Thanks, Patrick
Re: issue in adding data to a multivalued field
Hi Rakhi, Use multiValued (capital V), not multivalued. :) Koji Rakhi Khatwani wrote: Hi, i have a small schema with some of the fields defined as: field name=id type=string indexed=true stored=true multiValued=false required=true/ field name=content type=text indexed=true stored=true multivalued=false / field name=author_name type=text indexed=true stored=false multivalued=true/ where the field author_name is multivalued. however in UI (schema browser), following r the details of author_name field, its nowhere mentioned tht its multivalued. Field: author_name Field Type: text Properties: Indexed, Tokenized when i try creating and adding a document into solr, i get an exception ERROR_id1_multiple_values_encountered_for_non_multiValued_field_author_name_ninad_raakhi_goureya_sheetal here's my code snippet: solrDoc17.addField(id, id1); solrDoc17.addField(content, SOLR); solrDoc17.addField(author_name, ninad); solrDoc17.addField(author_name, raakhi); solrDoc17.addField(author_name, goureya); solrDoc17.addField(author_name, sheetal); server.add(solrDoc17); server.commit(); ny pointers?? regards, Raakhi
DIH: Setting rows= on full-import has no effect
In the past setting rows=n with the full-import command has stopped the DIH importing at the number I passed in, but now this doesn't seem to be working. Here is the command I'm using: curl ' http://localhost:8983/solr/indexer/mediawiki?command=full-importrows=100' But when 100 docs are imported the process keeps running. Here's the log output: Oct 8, 2009 5:23:32 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 100 Oct 8, 2009 5:23:33 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 200 Oct 8, 2009 5:23:35 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 300 Oct 8, 2009 5:23:36 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 400 Oct 8, 2009 5:23:38 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 500 and so on. Running on the most recent nightly: 1.4-dev 823366M - jayhill - 2009-10-08 17:31:22 I've used that exact url in the past and the indexing stopped at the rows number as expected, but I haven't run the command for about two months on a build from back in early July. Here's the dih config: dataConfig dataSource name=dsFiles type=FileDataSource encoding=UTF-8/ document entity name=f processor=FileListEntityProcessor baseDir=/path/to/files fileName=.*xml recursive=true rootEntity=false dataSource=null entity name=wikixml processor=XPathEntityProcessor forEach=/mediawiki/page url=${f.fileAbsolutePath} dataSource=dsFiles onError=skip field column=id xpath=/mediawiki/page/id/ field column=title xpath=/mediawiki/page/title/ field column=contributor xpath=/mediawiki/page/revision/contributor/username/ field column=comment xpath=/mediawiki/page/revision/comment/ field column=text xpath=/mediawiki/page/revision/text/ /entity /entity /document /dataConfig -Jay
Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)
Patrick, parsedQueryString was something similar to field:foo field:bar. At index time, it works like expected. I guess because you are searching q=foo bar, this causes OR query. Use q=foo bar, instead. Koji Patrick Jungermann wrote: Hi list, I worked on a field type and its analyzing chain, at which I want to use the SynonymFilter with entries similar to: foo bar=foo_bar During the analysis phase, I used the /admin/analysis.jsp view to test the analyzing results produced by the created field type. The output shows that a query foo bar will first be separated by the WhitespaceTokenizer to the two tokens foo and bar, and that the SynonymFilter will replace the both tokens with foo_bar. But as I tried this at real query time with the request handler standard and also with dismax, the tokens foo and bar were not replaced. The parsedQueryString was something similar to field:foo field:bar. At index time, it works like expected. Has anybody experienced this and/or knows a workaround, a solution for it? Thanks, Patrick
DIH Error in latest Nightly Builds
Hi All, I tried Indexing data and got the following error., Used Solr nightly Oct5th and nightly 8th, The same Configuration/query is working in Older version(May nightly Build) The db-data-config.xml has the simple Select query SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select CATALOG_ID, CATALOG_NUMBER, CATALOG_NAME, SEGMENTATION_ TYPE, BEGIN_OFFER_DATE, END_OFFER_DATE, FUTURE_BEGIN_DATE, FUTURE_END_DATE, ATONCE_BEGIN_DATE, ATONCE_END_DATE, REFERENCE_BEGIN_DATE, REFERENCE_END_DA TE, BEGIN_SEASON, LANGUAGE, COUNTRY, SIZE_TYPE, CURRENCY, DIVISION, LIFECYCLE, PRODUCT_CD, STYLE_CD, GLOBAL_STYLE_NAME, REGION_STYLE_NAME, NEW_STYLE, SIZE_RUN, COLOR_NBR, GLOBAL_COLOR_DESC, REGION_COLOR_DESC, WIDTH, CATEGORY, SUB_CATEGORY, CATEGORY_SUMMARY, CATEGORY_CORE_FOCUS, SPORT_ACTIVITY, SPORT _ACTIVITY_SUMMARY, GENDER_AGE, GENDER_AGE_SUMMARY, SILO, SILHOUETTE, SILHOUETTE_SUMMARY, SEGMENTATION_TIER, PRIMARY_COLOR, NEW_PRODUCT, CARRYOVER_PROD UCT, WHOLESALE_AMOUNT, RETAIL_AMOUNT, CATALOG_LAST_MOD_DATE, PRODUCT_LAST_MOD_DATE, STYLE_LAST_MOD_DATE, CATALOG_ID || '-' || PRODUCT_CD as UNIQ from prodsearch_atlasatgcombine Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:356) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: java.sql.SQLException: Unsupported feature at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:269) at oracle.jdbc.dbaccess.DBError.throwUnsupportedFeatureSqlException(DBError.java:689) at oracle.jdbc.driver.OracleConnection.setHoldability(OracleConnection.java:3065) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:191) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363) at org.apache.solr.handler.dataimport.JdbcDataSource.access$300(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240) ... 11 more Oct 8, 2009 6:30:23 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Oct 8, 2009 6:30:23 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback 2009-10-08 18:31:12.149::INFO: Shutdown hook executing 2009-10-08 18:31:12.149::INFO: Shutdown hook complete Thanks and regards, JK
Re: Scoring for specific field queries
Use the field analysis tool to see how the data is being analyzed in both the fields. Cheers Avlesh On Fri, Oct 9, 2009 at 12:56 AM, R. Tan tanrihae...@gmail.com wrote: Hmm... I don't quite get the desired results. Those starting with cha are now randomly ordered. Is there something wrong with the filters I applied? On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com wrote: Filters? I did not mean filters at all. I am in a mad rush right now, but on the face of it your field definitions look right. This is what I asked for - q=(autoComplete2:cha^10 autoComplete:cha) Lemme know if this does not work for you. Cheers Avlesh On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote: Hi Avlesh, I can't seem to get the scores right. I now have these types for the fields I'm targeting, fieldType name=autoComplete class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=autoComplete2 class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType My query is this, q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0 What should I tweak from the above config and query? Thanks, Rih On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote: I will have to pass on this and try your suggestion first. So, how does your suggestion (1 and 2) boost the my startswith query? Is it because of the n-gram filter? On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Yes it can be done but it needs some customization. Search for custom sort implementations/discussions. You can check... http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html . Let us know if you have any issues. Sandeep R. Tan wrote: This might work and I also have a single value field which makes it cleaner. Can sort be customized (with indexOf()) from the solr parameters alone? -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH Error in latest Nightly Builds
raised an issue https://issues.apache.org/jira/browse/SOLR-1500 On Fri, Oct 9, 2009 at 7:10 AM, jayakeerthi s mail2keer...@gmail.com wrote: Hi All, I tried Indexing data and got the following error., Used Solr nightly Oct5th and nightly 8th, The same Configuration/query is working in Older version(May nightly Build) The db-data-config.xml has the simple Select query SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select CATALOG_ID, CATALOG_NUMBER, CATALOG_NAME, SEGMENTATION_ TYPE, BEGIN_OFFER_DATE, END_OFFER_DATE, FUTURE_BEGIN_DATE, FUTURE_END_DATE, ATONCE_BEGIN_DATE, ATONCE_END_DATE, REFERENCE_BEGIN_DATE, REFERENCE_END_DA TE, BEGIN_SEASON, LANGUAGE, COUNTRY, SIZE_TYPE, CURRENCY, DIVISION, LIFECYCLE, PRODUCT_CD, STYLE_CD, GLOBAL_STYLE_NAME, REGION_STYLE_NAME, NEW_STYLE, SIZE_RUN, COLOR_NBR, GLOBAL_COLOR_DESC, REGION_COLOR_DESC, WIDTH, CATEGORY, SUB_CATEGORY, CATEGORY_SUMMARY, CATEGORY_CORE_FOCUS, SPORT_ACTIVITY, SPORT _ACTIVITY_SUMMARY, GENDER_AGE, GENDER_AGE_SUMMARY, SILO, SILHOUETTE, SILHOUETTE_SUMMARY, SEGMENTATION_TIER, PRIMARY_COLOR, NEW_PRODUCT, CARRYOVER_PROD UCT, WHOLESALE_AMOUNT, RETAIL_AMOUNT, CATALOG_LAST_MOD_DATE, PRODUCT_LAST_MOD_DATE, STYLE_LAST_MOD_DATE, CATALOG_ID || '-' || PRODUCT_CD as UNIQ from prodsearch_atlasatgcombine Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:356) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: java.sql.SQLException: Unsupported feature at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134) at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179) at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:269) at oracle.jdbc.dbaccess.DBError.throwUnsupportedFeatureSqlException(DBError.java:689) at oracle.jdbc.driver.OracleConnection.setHoldability(OracleConnection.java:3065) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:191) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363) at org.apache.solr.handler.dataimport.JdbcDataSource.access$300(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240) ... 11 more Oct 8, 2009 6:30:23 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Oct 8, 2009 6:30:23 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback 2009-10-08 18:31:12.149::INFO: Shutdown hook executing 2009-10-08 18:31:12.149::INFO: Shutdown hook complete Thanks and regards, JK -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: DIH: Setting rows= on full-import has no effect
I have raised an issue http://issues.apache.org/jira/browse/SOLR-1501 On Fri, Oct 9, 2009 at 6:10 AM, Jay Hill jayallenh...@gmail.com wrote: In the past setting rows=n with the full-import command has stopped the DIH importing at the number I passed in, but now this doesn't seem to be working. Here is the command I'm using: curl ' http://localhost:8983/solr/indexer/mediawiki?command=full-importrows=100' But when 100 docs are imported the process keeps running. Here's the log output: Oct 8, 2009 5:23:32 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 100 Oct 8, 2009 5:23:33 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 200 Oct 8, 2009 5:23:35 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 300 Oct 8, 2009 5:23:36 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 400 Oct 8, 2009 5:23:38 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 500 and so on. Running on the most recent nightly: 1.4-dev 823366M - jayhill - 2009-10-08 17:31:22 I've used that exact url in the past and the indexing stopped at the rows number as expected, but I haven't run the command for about two months on a build from back in early July. Here's the dih config: dataConfig dataSource name=dsFiles type=FileDataSource encoding=UTF-8/ document entity name=f processor=FileListEntityProcessor baseDir=/path/to/files fileName=.*xml recursive=true rootEntity=false dataSource=null entity name=wikixml processor=XPathEntityProcessor forEach=/mediawiki/page url=${f.fileAbsolutePath} dataSource=dsFiles onError=skip field column=id xpath=/mediawiki/page/id/ field column=title xpath=/mediawiki/page/title/ field column=contributor xpath=/mediawiki/page/revision/contributor/username/ field column=comment xpath=/mediawiki/page/revision/comment/ field column=text xpath=/mediawiki/page/revision/text/ /entity /entity /document /dataConfig -Jay -- - Noble Paul | Principal Engineer| AOL | http://aol.com
RE: Solr Quries
Thanks for your help. Can you please provide detail configuration for solr distributed environment. How to setup master and slave ? for this in which file/s I have to do changes ? What are the shard parameters ? Can we integrate zookeeper with this ? Please provide details for this. Thanks in advance. -Pravin -Original Message- From: Sandeep Tagore [mailto:sandeep.tag...@gmail.com] Sent: Wednesday, October 07, 2009 4:29 PM To: solr-user@lucene.apache.org Subject: Re: Solr Quries Hi Pravin, 1. Is solr work in distributed environment ? if yes, how to configure it? Yep. You can achieve this with Sharding. For example: Install and Configure Solr on two machines and declare any one of those as master. Insert shard parameters while you index and search your data. 2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? (Note: I am familiar with Hadoop) Sorry. No idea. 3. I have employee information(id, name ,address, cell no, personal info) of 1 TB ,To post(index)this data on solr server, shall I have to create xml file with this data and then post it to solr server? Or is there any other optimal way? In future my data will grow upto 10 TB , then how can I index this data ?(because creating xml is more headache ) I think, XML is not the best way. I don't suggest it. If you have that 1 TB data in a database you can achieve this simply using full import command. Configure your DB details in solr-config.xml and data-config.xml and add you DB driver jar to solr lib directory. Now import the data in slices (say dept wise, or in some category wise..). In future, you can import the data from a DB or you can index the data directly using client-API with simple java beans. Hope this info helps you. Regards, Sandeep Tagore -- View this message in context: http://www.nabble.com/Solr-Quries-tp25780371p25783891.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.