indexing unique keys
I have a use-case where we want to store unique keys ( Hashes) which would be used to compare against another set of keys ( Hashes) For example Index set= { h1, h2 , h3 , h4 } comparision set = { h1 , h2 } result set = h1,h2 Would it be an advantage to store index set in Solr instead of storing in traditional databases? Thanks in advance *Nipen Mark *
search hit on multivalued fields
I have a multivalued field Tex which is indexed , for example : F1: some value F2: some value Text = ( content of f1,f2) When user search , I am checking only a Text field but i would also need to display to users which Field ( F1 or F2 ) resulted the search hit Is it possible in SOLR ? -- Thanks, *Nipen Mark *
Re: filtering number and repeated contents
thanks Jack , I will try updateProcessor Between does SOLR store tokenized content in fields if field have property stored=true ? On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky j...@basetechnology.comwrote: My (very limited) understanding of boilerpipe in Tika is that it strips out short text, which is great for all the menu and navigation text, but the typical disclaimer at the bottom of an email is not very short and frequently can be longer than the email message body itself. You may have to resort to a custom update processor that is programmed with some disclaimer signature text strings to be removed from field values. -- Jack Krupansky -Original Message- From: Mark , N Sent: Tuesday, June 05, 2012 8:28 AM To: solr-user@lucene.apache.org Subject: filtering number and repeated contents Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and advertisements , I think it works well but looking forward to see If I could filter out disclaimer information too mainly in email texts. -- Thanks, *Nipen Mark * -- Thanks, *Nipen Mark *
filtering number and repeated contents
Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and advertisements , I think it works well but looking forward to see If I could filter out disclaimer information too mainly in email texts. -- Thanks, *Nipen Mark *
filtering footer information
Is it possible to filter certain repeated footer information from text documents while indexing to solr ? Are there any built-in filters similar to stop word filters ? -- Thanks, *Nipen Mark *
Re: wildcard and proximity searches
Hi were you successful in trying SOLR -1604 to allow wild card queries in phrases ? Also does this plugin allow us to use proximity with wild card * solr mail*~10 * If this the right approach to go ahead to support these functionalities? thanks Mark On Wed, Aug 4, 2010 at 2:24 PM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Thanks for you ideia. At this point I'm logging each query time. My ideia is to divide my queries into normal queries and heavy queries. I have some heavy queries with 1 minute or 2mintes to get results. But they have for instance (*word1* AND *word2* AND word3*). I guess that this will be always slower (could be a little faster with ReversedWildcardFilterFactory) but they never be ready in a few seconds. For now, I just increased the timeout for those :) (using solrnet). My priority at the moment is the queries phrases like word1* word2* word3. After this is working, I'll try to optimize the heavy queries Frederico -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: quarta-feira, 4 de Agosto de 2010 01:41 To: solr-user@lucene.apache.org Subject: Re: wildcard and proximity searches Frederico Azeiteiro wrote: But it is unusual to use both leading and trailing * operator. Why are you doing this? Yes I know, but I have a few queries that need this. I'll try the ReversedWildcardFilterFactory. ReverseWildcardFilter will help leading wildcard, but will not help trying to use a query with BOTH leading and trailing wildcard. it'll still be slow. Solr/lucene isn't good at that; I didn't even know Solr would do it at all in fact. If you really needed to do that, the way to play to solr/lucene's way of doing things, would be to have a field where you actually index each _character_ as a seperate token. Then leading and trailing wildcard search is basically reduced to a phrase search, but where the words are actually characters. But then you're going to get an index where pretty much every token belongs to every document, which Solr isn't that great at either, but then you can apply commongram stuff on top to help that out a lot too. Not quite sure what the end result will be, I've never tried it. I'd only use that weird special char as token field for queries that actually required leading and trailing wildcards. Figuring out how to set up your analyzers, and what (if anything) you're going to have to do client-app-side to transform the user's query into something that'll end up searching like a phrase search where each 'word' is a character is left as an exersize for the reader. :) Jonathan -- Nipen Mark
Re: wildcard and proximity searches
Thanks ahmet Is it also possible to search the document having a field ENDING with week* query should return documents with a field ending with week and its derivatives such as weekly,weeks So above query should return this week Past three weeks Report weekly thanks chandan On Tue, Oct 5, 2010 at 5:04 PM, Ahmet Arslan iori...@yahoo.com wrote: Also does this plugin allow us to use proximity with wild card * solr mail*~10 * Yes it supports solr mail*~10 kind of queries without any problem. Currently it throws exception with mail* kind of queries, but they are not valid phrase queries. Because there is only one clause inside quotation marks. -- Nipen Mark
Re: question on wild card
thanks erick . One more question when the perfect world* is passed as search query its converted as ? perfect world what does ? mean Since i am using standard analyzer i thought stop word the is removed thanks On Thu, Jul 15, 2010 at 7:01 AM, Erick Erickson erickerick...@gmail.comwrote: The best way to understand how things are parsed is to go to the solr admin page (Full interface link?) and click the debug info box and submit your query. That'll tell you exactly what happens. Alternatively, you can put debugQuery=on on your URL... HTH Erick On Wed, Jul 14, 2010 at 8:48 AM, Mark N nipen.m...@gmail.com wrote: I have a database field = hello world and i am indexing to *text* field with standard analyzer ( text is a copy field of solr) Now when user gives a query text:hello world% , how does the query is interpreted in the background are we actually searchingtext: hello OR text: world%( consider by default operator is OR ) -- Nipen Mark -- Nipen Mark
question on wild card
I have a database field = hello world and i am indexing to *text* field with standard analyzer ( text is a copy field of solr) Now when user gives a query text:hello world% , how does the query is interpreted in the background are we actually searchingtext: hello OR text: world%( consider by default operator is OR ) -- Nipen Mark
Two analyzer per field
Is it possible to specify two analyzers per fields for example , consider a field *F1 *( keyword analyzer) = cheers mate *F2 *(keyword analyzer ) = hello world There is also a copy field *TEXT *( standard analyzer ) which will store the terms { cheers mate hello world } now when user perform any search we will be looking at copy field TEXT only which uses standard analyzer . Suppose user search hello word phrase it will not return any result as hello and world terms are tokenized . is it possible that I index hello world as it is as well in to *TEXT*field ? i.e can I use keyword analyzer as well and standard analyzer for field TEXT what should be better approach to handle this situation ? -- Nipen Mark
Solr DataImportHandler
Is it possible to use solr DataImportHandler when that database fields are not fixed ? As per my findings we need to configure which table ( entity) we will read the data and must match which fields in database will map to fields in solr schema Since in my case database fields could be dynamic , can DIH be helpful ? please suggest -- Nipen Mark
indexing a huge data
what should be the fastest way to index a documents , I am indexing huge collection of data after extracting certain meta - data information for example author and filename of each files i am extracting these information and storing in XML format for example : fileid 1fileidauthorabc /author filenameabc.doc/filename fileid 2fileidauthorabc /author filenameabc1.doc/filename I can not index these documents directly to solr as it is not in the format required by solr ( i can not change the format as its used in other modules) should converting these file to CSV will be better and faster approach compared to XML? please suggest -- Nipen Mark
Re: Getting max/min dates from solr index
thanks . Is it possible to do date faceting on multiple solr shards? I am using index created in two different shards to do date faceting on field DATE * http://localhost:8983/solr/1_13_1_3/select?shards=localhost:8983/solr/index1/,localhost_two:8983/solr/index/start=0rows=20q=*facet=truefacet.date=DATEfacet.date.start=2004-01-01T00:00:00Zfacet.date.end=2011-01-01T00:00:00Zfacet.date.gap=%2B1YEAR * On Fri, Feb 12, 2010 at 3:39 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Mark, Yes, facets will give you that information. Min/max StatsComponent? See http://www.search-lucene.com/?q=StatsComponent Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Mark N nipen.m...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, February 10, 2010 8:12:43 AM Subject: Getting max/min dates from solr index How can we get the max and min date from the Solr index ? I would need these dates to draw a graph ( for example timeline graph ) Also can we use date faceting to show how many documents are indexed every month . Consider I need to draw a timeline graph for current year to show how many records are indexed for every month .So i will have months in X axis and no of document in Y axis. What should be the better approach to design a schema to achieve this functionality ? Any suggestions would be appreciated thanks -- Nipen Mark -- Nipen Mark
Getting max/min dates from solr index
How can we get the max and min date from the Solr index ? I would need these dates to draw a graph ( for example timeline graph ) Also can we use date faceting to show how many documents are indexed every month . Consider I need to draw a timeline graph for current year to show how many records are indexed for every month .So i will have months in X axis and no of document in Y axis. What should be the better approach to design a schema to achieve this functionality ? Any suggestions would be appreciated thanks -- Nipen Mark
solr updateCSV
I am trying to use solr's csv updater to index the data , i am tryin to specify the .Dat format consisting of field seperator , text qualifier and a line seperator for example field 1 field separator field 2field seperator text qualifiervalue for field 1text qualifierfield seperatortext qualifiervalue for field 2 text qualifierfield seperatorline seperator Can we specify text qualifier and line seperator as well ? I have tested that we can specify a seperator and works good. -- Nipen Mark
Indexing large text documents
SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( Fulltext, strContent); strContent is a string variable which contains contents of text file. ( assume that text file is located in c:\files\abc.txt ) In my case abc.text ( text files ) could be very huge ~ 2 GB so it is not always possible to read and store them into string variables before indexing . Can anyone suggest what should be better approach to index these huge text files ? -- Nipen Mark
Enumerating wildcard terms
Is it possible to enumerate all terms that match the specified wildcard filter term. Similar to Lunce WildCardTermEnum API for example if I search abc* then I just should able to access all the terms abc1, abc2 , abc3... that exists in Index What should be better approach to meet this functionality ? -- Nipen Mark
Re: nested solr queries
hi shalin I am trying to achieve something like JOIN. Previously am doing this with two queries on solr solr index = ( field1 ,field 2, field3) query1 = ( for example field1=ABC ) suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1 query2 = ( get all records having field2=xyz for each records i.e for set1= {1,2,3,4} returned by query1 ) Am not sure if I could do something like this using the nested solr query from link http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ thanks On Mon, Nov 30, 2009 at 1:50 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Nov 30, 2009 at 1:19 PM, Mark N nipen.m...@gmail.com wrote: Is it possible to write nested queries in Solr similar to sql like query where I can take results of the first query and use one or more of its fields as an argument in the second query. That sounds like a join. If so, the answer would be no. For example: field1:XYZ AND (_query_: field3:{value of field4}) This should search for all types of XYZ and then iterate over the result set and perform a query for where field3 is equal to the value of field1 from each item of the first result set. Your description is not consistent with the query you have given. If field:XYZ is specified, then what are types of XYZ? Also, if you want to perform a query where field3 is equal to the value of field1 then, what is field4 in the query you have given? this is similar to SQL like query select distinct ( fieldA ) from table where fieldA IN That sounds similar to faceting. See http://wiki.apache.org/solr/SimpleFacetParameters Perhaps you can give more details on what you want to achieve. -- Regards, Shalin Shekhar Mangar.
Re: nested solr queries
field2=xyz we dont know until we run query1 To simply i was actually trying to do some kind of JOIN similar to following SQL query select * from table1 where *field2* in ( select *field2 *from dbo.concept_db where field1='ABC' ) if this is not possible then i will have to search inner query ( select *field2 *from dbo.concept_db where field1='ABC' ) first and then only run the outer query thanks chandan On Mon, Nov 30, 2009 at 2:25 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Nov 30, 2009 at 2:02 PM, Mark N nipen.m...@gmail.com wrote: hi shalin I am trying to achieve something like JOIN. Previously am doing this with two queries on solr solr index = ( field1 ,field 2, field3) query1 = ( for example field1=ABC ) suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1 query2 = ( get all records having field2=xyz for each records i.e for set1= {1,2,3,4} returned by query1 ) That sequence of queries will return documents which have field1=ABC and field2=xyz. The same result can be obtained in one query with q=+field1:ABC +field2:xyz Have I misunderstood the problem? Am not sure if I could do something like this using the nested solr query from link http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ No, nested queries can only influence scores. They do not filter the results. -- Regards, Shalin Shekhar Mangar.
Re: nested solr queries
thanks for your help so do you think I should execute solr queries twice ? or is there any other workarounds On Mon, Nov 30, 2009 at 3:07 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Nov 30, 2009 at 2:26 PM, Mark N nipen.m...@gmail.com wrote: field2=xyz we dont know until we run query1 Ah, ok. I thought xyz was a literal that you wanted to search. To simply i was actually trying to do some kind of JOIN similar to following SQL query select * from table1 where *field2* in ( select *field2 *from dbo.concept_db where field1='ABC' ) if this is not possible then i will have to search inner query ( select *field2 *from dbo.concept_db where field1='ABC' ) first and then only run the outer query No, there are no joins in Solr. Consider de-normalizing your schema, if you haven't. -- Regards, Shalin Shekhar Mangar. -- Nipen Mark
nested solr queries
Is it possible to write nested queries in Solr similar to sql like query where I can take results of the first query and use one or more of its fields as an argument in the second query. For example: field1:XYZ AND (_query_: field3:{value of field4}) This should search for all types of XYZ and then iterate over the result set and perform a query for where field3 is equal to the value of field1 from each item of the first result set. this is similar to SQL like query select distinct ( fieldA ) from table where fieldA IN