Format content field
Greetings all! I have created a enterprise search architecture that includes both nutch for crawling as well as solr for indexing. I was so focused on the nutch part that I didn't realized that my user interface (Jquery based) was lacking in appeal. One of my issues is the format of the text in the content field. Is there any way to force it to include spaces, etc for the text. for instance, this is an example of a value: thereisno way to know.Next sentence goes here.BUT I am all squished This is sample content from a html page. -- View this message in context: http://lucene.472066.n3.nabble.com/Format-content-field-tp3941336p3941336.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy and Faceting
Based on this: keyword_apikeyVALID_ALCHEMYAPI_KEY/keyword_apikey concept_apikeyVALID_ALCHEMYAPI_KEY/concept_apikey lang_apikeyVALID_ALCHEMYAPI_KEY/lang_apikey cat_apikeyVALID_ALCHEMYAPI_KEY/cat_apikey entities_apikeyVALID_ALCHEMYAPI_KEY/entities_apikey oc_licenseIDVALID_OPENCALAIS_KEY/oc_licenseID ...this can't be used unless you use some sort of processing engine? I am playing around with some other open source tagging software, but I have yet to get very far. -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2079148.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy and Faceting
Any luck with a tutorial? :-) -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2040246.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy and Faceting
Can someone enlighten me on how to get started with this patch? I am running solr 1.4.1 and I need to download the latest trunk and apply the patch obviously.. But after that, I am sort of clueless.. I am assuming there are some things that have to happen in solr config and schema files. Reading the code now... -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2033563.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy and Faceting
That would be AMAZING!! And much appreciated ;-) -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2033657.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Newbie - need a point in the right direction
I my experience, the hardest (but most flexible part) is exactly what was mentioned.. processing the data. Nutch does have a really easy plugin interface that you can use, and the example plugin is a great place to start. Once you have the raw parsed text, you can do what ever you want with it. For example, I wrote a plugin to add geospatial information to my NutchDocument. You then map the fields you added in the NutchDocument to something you want to have Solr index. In my case I created a geography field where I put lat, lon info. Then you create that same geography field in the nutch to solr mapping file as well as your solr schema.xml file. Then, when you run the crawl and tell it to use solrindex it will send the document to solr to be indexed. Since you have your new field in the schema, it knows what to do with it at index time. Now you can build a user interface around what you want to do with that field. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Newbie-need-a-point-in-the-right-direction-tp2031381p2033687.html Sent from the Solr - User mailing list archive at Nabble.com.
Taxonomy and Faceting
I have been digging through the user lists for Solr and Nutch, as well as reading lots of blogs, etc. I have yet to find a clear answer (maybe there is none ) I am trying to find the best way ahead for choosing a technology that will allow the ability to use a large taxonomy for classifying structured and unstructured data and then displaying those categorizations as facets to the user during search. There seems to be several approaches, some of which make use of index time for encoding the terms found in the text, but I have seen no mention of HOW to get those terms from the text. Some sort of text classification software I am assuming. If this is true, are there any good open source engines that can process text against a taxonomy? The other approach seems to be two patches being developed for Solr 3.0, 792 and 64. Again, I think you would have to have some sort of an engine to give you this information that could then be added at index time. I have also seen some interesting literature on using Drupal and the Solr module. My current architecture uses Nutch (1.2) for crawling, solrindex for inexing (Solr 1.4.1), and Ajax Solr for my UI. I have also seen some talk in webinars/etc from Lucid Imagination about upcoming development on Native Taxonomy Facets, any idea where that development stands? I have to use the most stable version of Solr/Nutch/Lucene possible for my implementation, because, unfortunately, once I choose, going back will be next to impossible for years to come! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2028442.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy and Faceting
Thanks for the quick response! I was thinking more about the idea of having both structured and unstructred data coming into a system to be indexed/searched. I would like these documents to be processed by some sort of entity/keyword/semantic processing. I have a well defined taxonomy for my organization (it is quite large) and at the moment we use RetrievalWare to give keyword/classification suggestions. This does NOT work well though, and RetrievalWare is pretty much useless to us. I want a way to do this process either at index time or search time. All documents should be processed against this taxonomy. I do not want the user to be able to nominate keywords, it must happen automatically. I am assuming it is only natural for these keywords/taxonomy entities to show up as hierarchical facets? From what I can tell, there is no way to tell Solr.. here is my taxonomy.. classify my documents and give me back facets and facet counts.. -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2029636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Local Solr, Spatial Search, and LatLonType clarification
Thank you!! I am not sure if I can take advantage of this however, because my entry point into the solr index is through nutch. I have to map the nutch -- solr fields in the solrindex-mapping.xml file. I am not sure if dynamic fields are supported :-( What version of solr should I look at using that offers the most geospatial support? I also need to think about the compatibility with nutch though.. that might be a stopping point for me. -- View this message in context: http://lucene.472066.n3.nabble.com/Local-Solr-Spatial-Search-and-LatLonType-clarification-tp1609570p1614309.html Sent from the Solr - User mailing list archive at Nabble.com.
Local Solr, Spatial Search, and LatLonType clarification
I have been reading through all the jira issues and patches, as well as the wikis and I still have a few things that are not clear to me. I am currently running with Solr 1.4.1 and using Nutch for my crawling. Everything is working great, I am using a Nutch plugin to add lat long information, I just don't know if it is possible to do what I am wanting to do. 1. I noticed that it said that the type of LatLongType can not be mulitvalued. Does that mean that I can not have multiple lat/lon values for one document. If so, that would be quite a limitation. I have an average of 10 geotags per document. 2. Is LocalSolr and SpatialSearch the same thing? 3. If I did want to use the LatLonType with the BBOX filter, where would I go to get a patch for 1.4.1? Is it even possible to patch 1.4.1 or do I have to go to an entirely different dev version of Solr? Thanks for your input!!! -- View this message in context: http://lucene.472066.n3.nabble.com/Local-Solr-Spatial-Search-and-LatLonType-clarification-tp1609570p1609570.html Sent from the Solr - User mailing list archive at Nabble.com.
LocalSolr, Spatial Search, LatLonType clarification
I have been reading through all the jira issues and patches, as well as the wikis and I still have a few things that are not clear to me. I am currently running with Solr 1.4.1 and using Nutch for my crawling. Everything is working great, I am using a Nutch plugin to add lat long information, I just don't know if it is possible to do what I am wanting to do. 1. I noticed that it said that the type of LatLongType can not be mulitvalued. Does that mean that I can not have multiple lat/lon values for one document. If so, that would be quite a limitation. I have an average of 10 geotags per document. 2. Is LocalSolr and SpatialSearch the same thing? 3. If I did want to use the LatLonType with the BBOX filter, where would I go to get a patch for 1.4.1? Is it even possible to patch 1.4.1 or do I have to go to an entirely different dev version of Solr? Thanks for your input!!! -- View this message in context: http://lucene.472066.n3.nabble.com/LocalSolr-Spatial-Search-LatLonType-clarification-tp1609043p1609043.html Sent from the Solr - User mailing list archive at Nabble.com.