Re: solr
Please follow guidelines from : http://lucene.apache.org/solr/tutorial.html http://lucene.apache.org/solr/tutorial.html/Sumit On Sat, Aug 21, 2010 at 11:25 AM, ankita shinde ankita.shind...@gmail.comwrote: Hello, I am new to solr. Can anyone please guide me how to install and use solr? Reply. -Ankita Shinde
Re: Require some advice
Hi Pavan, you may want to plug UIMA as a particular UpdateRequestProcessor [1] while indexing data (I am working on such a use case). This way you could extract entities and add them either as dynamicFields or pre defined (fixed) fields. 2010/8/12 Michael Griffiths mgriffi...@am-ind.com While there are some decent open source entity extraction tools, they are focused on processing sentences and paragraphs. The structural differences in text messages means you'd need to do a fair amount of work to get decent entity extraction. That said, you may want to look into simple word/phrase matching if your domain is sufficiently small. Use RegEx to extract ZIP, use dictionaries to extract city/area, skills, and names. Much simpler and cheaper. in UIMA you have some components that may be useful (DictionaryAnnotator, ConceptMapper, Tagger, RegExAnnotator [2] ) for the above cases, however, as Michael underlined, you have to consider the effort needed to understand, use and eventually customize such components. UIMA is well suited for large scale collections of data and let you work on a flexible and customizable analysis pipeline that may change and be enriched in the future, but you have to evaluate well if you deserve it. 2010/8/12 Nagelberg, Kallin knagelb...@globeandmail.com Try this, http://viewer.opencalais.com/ the OpenCalais service is wrapped as a UIMA analysis engine and may be called inside a UIMA pipeline together with other components (see above) or services (i.e.: the UIMA wrapped Alchemy API service [3] ). That said, this makes sense only if you are strongly focused on searching over text and its extracted entities. My 2 cents, Tommaso [1] : http://wiki.apache.org/solr/UpdateRequestProcessor [2] : http://uima.apache.org/annotators.html [3] : http://svn.apache.org/viewvc/uima/sandbox/trunk/AlchemyAPIAnnotator/
solr
hi all, is there need to allot a unique id to every file? do we have to specify the id manually or solr does it? how to allot an unique id to text file?
Re: solr
Hello! is there need to allot a unique id to every file? You don`t need one, unique id is not mandatory, but many features wont work without it. do we have to specify the id manually or solr does it? Solr doesn`t do it automatically, You have to do it. how to allot an unique id to text file? Just generate an id in your application and pass it to ie. xml file. If you have some questions about uniqe id, this page should be a place for You http://wiki.apache.org/solr/UniqueKey -- Regards, Rafał Kuć
solr
hi, does all the data to be indexed has to be in exampledocs folder? how to import data from mysql? I have tried the steps on http://wiki.apache.org/solr/DataImportHandler. but its giving me error as could not create importer.dataimporter. What does it mean? I am completely new to solr. How to configure solr?
Possible to have more than 1 uniqueKey fields in a document?
Is it possible to define more than 1 uniqueKey fields per document in schema.xml?
Duplicate docs when mergin
-- View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-docs-when-mergin-tp1261979p1261979.html Sent from the Solr - User mailing list archive at Nabble.com.
How to Debug Sol-Code in Eclipse ?!
Hello.. Can anyone give me some tipps to debug the solr-code in Eclipse ? or do i need apache-Ant to do this ? thhx =) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1262050.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr
Hi Ankita, first: thanks for trying apache solr. does all the data to be indexed has to be in exampledocs folder? No. And there are several ways to push data into solr: via indexing, dataimporthandler, solrj, ... I know that getting comfortable with a new project is a bit complicated at first, but you should try to read some more information available at the wiki etc. before publishing posts in a rush. (keep in mind that a mailing list is mostly driven by people in their freetime) Here are some links: http://wiki.apache.org/solr/FAQ http://wiki.apache.org/solr/SolrResources http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-1 http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 Another nice documentation is: http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide Hope you don't misunderstand me wrong! So, ask if you need help, but try to dig into deeper before! Regards, Peter. hi, does all the data to be indexed has to be in exampledocs folder? how to import data from mysql? I have tried the steps on http://wiki.apache.org/solr/DataImportHandler. but its giving me error as could not create importer.dataimporter. What does it mean? I am completely new to solr. How to configure solr?
Re: /update/extract
The Extract Request Handler invokes the classes from the extraction package. https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java This is package into the apache-solr-cell jar. Regards, Jayendra* * On Thu, Aug 19, 2010 at 10:04 AM, satya swaroop sswaro...@gmail.com wrote: Hi all, when we handle extract request handler what class gets invoked.. I need to know the navigation of classes when we send any files to solr. can anybody tell me the classes or any sources where i can get the answer.. or can anyone tell me what classes get invoked when we start the solr... I be thankful if anybody can help me with regarding this.. Regards, satya
Re: Duplicate docs when merging indices?
On Sat, 21 Aug 2010 05:26:59 -0700 (PDT) Andrew Clegg andrew.cl...@gmail.com wrote: [...] If I merge two indices with CoreAdmin, as detailed here... http://wiki.apache.org/solr/MergingSolrIndexes What happens to duplicate documents between the two? i.e. those that have the same unique key. What decides which copy takes precedence? Will documents get indexed multiple times, or will the second one just get skipped? [...] Have not used CoreAdmin, but with MergeTool, know from personal experience that there would be duplicates created. I imagine that the same is the case for CoreAdmin as Solr/Lucene allows duplicate IDs. Regards, Gora
Re: facets - id and display value
Faceting harvests the fields that are already indexed (so you have to both store and index the fields) and uses Java object refs (pointers), without copying the facet values. You know how log files have multi-line exception stacks the like? The multi-line exception stacks after the real log line tend to be the same. I grabbed all of the lines after each log line and made facets out of them. Worked quite well for counting this exception stack happened 42 times, this other one 250 times. So huge string fields work as facets. I don't know if 'facet.prefix' on 50 characters is faster than 'q=' on 200 characters. Sending a giant query is easy: use a POST instead of a GET. If searching on giant facet strings really is a problem, add a hash code to each facet string. Then, add a separate matching field in each document that only stores that hashcode. Now, instead of searching on the giant facet, you pull the hashcode out of it and search the separate field for that. On Fri, Aug 20, 2010 at 9:56 PM, Jonathan Rochkind rochk...@jhu.edu wrote: A common way is to make a facet string of categoryId-2_name_imageurl. Then in your UI display the categoryId part of the facet. I've been thinking about doing something like this for the same purposes. Will having an extra long facet string like that have any effect on faceting performace? How about facet sorting with facet.sort=index? In my case, the first part of the facet string would be a 'sortable' value that sorts how I want, not just an id. I use facet.sort=index, but my display labels don't actually sort the way I want, so I'm thinking of making a sort key that does, and storing sortkey_label in the actual facet value. But I worry this may have an effect on performance if the string gets really long. But I'm thinking/hoping it won't -- at least for faceting the length of string shoudln't matter, I think, but not sure about for sorting. [Obviously you have to make sure to not accidentally store the same 'id' with differently serialized 'metadata', or you'd wind up with two facet values where you meant to have one]. Is there any reason I couldn't use some non-printing control char as the seperator, instead of just in that example ascii underscore? And then the other thing is, once I have these weird long facet strings with embedded 'metadata', if I actually want to 'fq' on one, I need to pass that whole weird string in the fq, clearly. How do people generally deal with this, using this technique? Just do it, pass the whole string? Use some sort of 'prefix' technique (I guess that would be the * wildcard in the fq)? Use two different solr fields, one for faceting with embedded metadata, and a different one with the same values without embedded metadata for actual 'fq' filtering? Thanks for any tips, Jonathan -- Lance Norskog goks...@gmail.com
Re: solr
This will make a unique key for you: In types fieldType name=uuid class=solr.UUIDField indexed=true / In fields field name=id type=uuid indexed=true stored=true default=NEW/ 2010/8/21 Rafał Kuć ra...@alud.com.pl: Hello! is there need to allot a unique id to every file? You don`t need one, unique id is not mandatory, but many features wont work without it. do we have to specify the id manually or solr does it? Solr doesn`t do it automatically, You have to do it. how to allot an unique id to text file? Just generate an id in your application and pass it to ie. xml file. If you have some questions about uniqe id, this page should be a place for You http://wiki.apache.org/solr/UniqueKey -- Regards, Rafał Kuć -- Lance Norskog goks...@gmail.com
Re: Possible to have more than 1 uniqueKey fields in a document?
There can be as many as you want. Buy you can only specify one as the uniqueKey. That is used for Distributed Search and deduplication. Indexing might work better if you concatenate the different unique values into one field. On Sat, Aug 21, 2010 at 3:27 AM, Andy angelf...@yahoo.com wrote: Is it possible to define more than 1 uniqueKey fields per document in schema.xml? -- Lance Norskog goks...@gmail.com
Re: How to Debug Sol-Code in Eclipse ?!
Running unit tests is easy, once you set the right 'current directory' so that unit tests can find their resource files. I have found that if I get a full set of unit tests for something, I don't have to debug it in the full app. Running the whole thing as a servlet has the whole servlet engine setup thing, which I avoid. Running as EmbeddedSolr might be easy, I haven't tried it. I usually make a separate empty Java project and import source and libs as needed. I do a lot of 'search everything for this string' so having the whole source tree in the project just slows me down. This does remove the ability to use the svn/git management, but I don't mind that. On Sat, Aug 21, 2010 at 5:27 AM, stockii st...@shopgate.com wrote: Hello.. Can anyone give me some tipps to debug the solr-code in Eclipse ? or do i need apache-Ant to do this ? thhx =) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1262050.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Possible to have more than 1 uniqueKey fields in a document?
I'm still a bit confused. Can I define 2 uniqueKey fields in schema.xml? I want to use 2 outside apps. One define a uniqueKey that is a mix of alphabets and numbers. Another app requires a uniqueKey of the type long. Obviously the 2 requirements aren't compatible. I'm trying to see if it's possible to define 2 uniqueKeys so each app could have its own one. --- On Sat, 8/21/10, Lance Norskog goks...@gmail.com wrote: From: Lance Norskog goks...@gmail.com Subject: Re: Possible to have more than 1 uniqueKey fields in a document? To: solr-user@lucene.apache.org Date: Saturday, August 21, 2010, 5:23 PM There can be as many as you want. Buy you can only specify one as the uniqueKey. That is used for Distributed Search and deduplication. Indexing might work better if you concatenate the different unique values into one field. On Sat, Aug 21, 2010 at 3:27 AM, Andy angelf...@yahoo.com wrote: Is it possible to define more than 1 uniqueKey fields per document in schema.xml? -- Lance Norskog goks...@gmail.com
Re: Possible to have more than 1 uniqueKey fields in a document?
You can only have one field marked as the unique key on Solr. That's it. If you happen to have two unique values per document, that is ok. Only one of them can be the official unique key. It's just like a primary key in a database table. You can't have two primaries. There can only be one. - Highlaner On Sat, Aug 21, 2010 at 5:00 PM, Andy angelf...@yahoo.com wrote: I'm still a bit confused. Can I define 2 uniqueKey fields in schema.xml? I want to use 2 outside apps. One define a uniqueKey that is a mix of alphabets and numbers. Another app requires a uniqueKey of the type long. Obviously the 2 requirements aren't compatible. I'm trying to see if it's possible to define 2 uniqueKeys so each app could have its own one. --- On Sat, 8/21/10, Lance Norskog goks...@gmail.com wrote: From: Lance Norskog goks...@gmail.com Subject: Re: Possible to have more than 1 uniqueKey fields in a document? To: solr-user@lucene.apache.org Date: Saturday, August 21, 2010, 5:23 PM There can be as many as you want. Buy you can only specify one as the uniqueKey. That is used for Distributed Search and deduplication. Indexing might work better if you concatenate the different unique values into one field. On Sat, Aug 21, 2010 at 3:27 AM, Andy angelf...@yahoo.com wrote: Is it possible to define more than 1 uniqueKey fields per document in schema.xml? -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
Autocomplete and Sorting on multiple multi-value/single-value fields
Hi, I'm wondering if anyone has run across this issue before. I do understand that you cannot sort on a multivalued field -- so I'm looking for alternatives people have used. Let's say I have nine fields: field name=title type=text indexed=true stored=true required=true/ field name=titleac type=autocomplete indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true/ field name=titlesort type=alphaOnlySort indexed=true stored=true/ field name=cast type=text indexed=true stored=true required=true multiValued=true/ field name=castac type=autocomplete indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true multiValued=true/ field name=crew type=text indexed=true stored=true required=true multiValued=true/ field name=crewac type=autocomplete indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true multiValued=true/ The text field type is standard: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType The autocomplete field type is pretty standard as well: fieldType name=autocomplete1 class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType The sort I need to be case sensitive including punctuation etc, so that field type looks like this: fieldType name=alphaOnlySort class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType So if I do this: http://localhost:8983/solr/core/select/?q=titleac:drversion=2.2start=0rows=100indent=onfl=titlesort=titlesort asc Everything works and I get a set of autocompleted results starting with dr in all forms sorted. Exactly what I want. The problem is that I also need to do this: http://localhost:8983/solr/core/select/?q=(titleac:dr or castac:dr)version=2.2start=0rows=100indent=onfl=title,cast (and the results need to be sorted across both the title field or a match in the multivalued cast field) And I also need to do this: http://localhost:8983/solr/core/select/?q=(titleac:dr or castac:dr or crewac:dr)version=2.2start=0rows=100indent=onfl=title,cast,crew (and the results need to be sorted across both the title field or a match in the multivalued cast field or a match in the multivalued crew field) As you can see I'm trying to autocomplete across multiple fields some of which are multi-valued and then sort those results in solr so solr does all my paging work. This way I don't have to load the full results sets into my jvm client and then manually sort them each time. You can also see I'm trying to make it into one query as my assumption is that this will take the least amount of time. Would anyone happen to have suggestions to how I'm approaching this problem? Thanks, Neil