Format content field

2012-04-26 Thread webdev1977
Greetings all!

I have created a enterprise search architecture that includes both nutch for
crawling as well as solr for indexing.  I was so focused on the nutch part
that I didn't realized that my user interface (Jquery based) was lacking in
appeal.

One of my issues is the format of the text in the content field.  Is there
any way to force it to include spaces, etc for the text.  

for instance, this is an example of a value:

thereisno way to know.Next sentence goes here.BUT I am all squished  

This is sample content from a html page.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Format-content-field-tp3941336p3941336.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy and Faceting

2010-12-13 Thread webdev1977

Based on this:

keyword_apikeyVALID_ALCHEMYAPI_KEY/keyword_apikey 

  concept_apikeyVALID_ALCHEMYAPI_KEY/concept_apikey 

  lang_apikeyVALID_ALCHEMYAPI_KEY/lang_apikey 

  cat_apikeyVALID_ALCHEMYAPI_KEY/cat_apikey 

  entities_apikeyVALID_ALCHEMYAPI_KEY/entities_apikey 

  oc_licenseIDVALID_OPENCALAIS_KEY/oc_licenseID 


...this can't be used unless you use some sort of processing engine?  I am
playing around with some other open source tagging software, but I have yet
to get very far.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2079148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy and Faceting

2010-12-08 Thread webdev1977

Any luck with a tutorial?  :-)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2040246.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy and Faceting

2010-12-07 Thread webdev1977

Can someone enlighten me on how to get started with this patch?  I am running
solr 1.4.1 and I need to download the latest trunk and apply the patch
obviously.. But after that, I am sort of clueless.. I am assuming there are
some things that have to happen in solr config and schema files. 

Reading the code now... 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2033563.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy and Faceting

2010-12-07 Thread webdev1977

That would be AMAZING!! And much appreciated ;-)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2033657.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Newbie - need a point in the right direction

2010-12-07 Thread webdev1977

I my experience, the hardest (but most flexible part) is exactly what was
mentioned.. processing the data.  Nutch does have a really easy plugin
interface that you can use, and the example plugin is a great place to
start.  Once you have the raw parsed text, you can do what ever you want
with it.  For example, I wrote a  plugin to add geospatial information to my
NutchDocument.  You then map the fields you added in the NutchDocument to
something you want to have Solr index.  In my case I created a geography
field where I put lat, lon info.  Then you create that same geography field
in the nutch to solr mapping file as well as your solr schema.xml file. 
Then, when you run the crawl and tell it to use solrindex it will send the
document to solr to be indexed.  Since you have your new field in the
schema, it knows what to do with it at index time.  Now you can build a user
interface around what you want to do with that field.  


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Newbie-need-a-point-in-the-right-direction-tp2031381p2033687.html
Sent from the Solr - User mailing list archive at Nabble.com.


Taxonomy and Faceting

2010-12-06 Thread webdev1977

I have been digging through the user lists for Solr and Nutch, as well as
reading lots of blogs, etc.  I have yet to find a clear answer (maybe there
is none )

I am trying to find the best way ahead for choosing a technology that will
allow the ability to use a large taxonomy for classifying structured and
unstructured data and then displaying those categorizations as facets to the
user during search.  

There seems to be several approaches, some of which make use of index time
for encoding the terms found in the text, but I have seen no mention of HOW
to get those terms from the text.  Some sort of text classification software
I am assuming.  If this is true, are there any good open source engines that
can process text against a taxonomy?

The other approach seems to be two patches being developed for Solr 3.0, 792
and 64.  Again, I think you would have to have some sort of an engine to
give you this information that could then be added at index time. 

I have also seen some interesting literature on using Drupal and the Solr
module.  

My current architecture uses Nutch (1.2) for crawling, solrindex for inexing
(Solr 1.4.1), and Ajax Solr for my UI.  

I have also seen some talk in webinars/etc from Lucid Imagination about
upcoming development on Native Taxonomy Facets, any idea where that
development stands?

I have to use the most stable version of Solr/Nutch/Lucene possible for my
implementation, because, unfortunately, once I choose, going back will be
next to impossible for years to come!

Thanks!




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2028442.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy and Faceting

2010-12-06 Thread webdev1977

Thanks for the quick response!  

I was thinking more about the idea of having both structured and unstructred
data coming into a system to be indexed/searched.  I would like these
documents to be processed by some sort of entity/keyword/semantic
processing.  I have a well defined taxonomy for my organization (it is quite
large) and at the moment we use RetrievalWare to give keyword/classification
suggestions.  This does NOT work well though, and RetrievalWare is pretty
much useless to us.  

I want a way to do this process either at index time or search time.  All
documents should be processed against this taxonomy.  I do not want the user
to be able to nominate keywords, it must happen automatically.   I am
assuming it is only natural for these keywords/taxonomy entities to show up
as hierarchical facets?

From what I can tell, there is no way to tell Solr.. here is my taxonomy..
classify my documents and give me back facets and facet counts.. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2029636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Local Solr, Spatial Search, and LatLonType clarification

2010-10-01 Thread webdev1977

Thank you!!

I am not sure if I can take advantage of this however, because my entry
point into the solr index is through nutch.  I have to map the nutch --
solr fields in the solrindex-mapping.xml file.  I am not sure if dynamic
fields are supported :-(  

What version of solr should I look at using that offers the most geospatial
support?  I also need to think about the compatibility with nutch though..
that might be a stopping point for me.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Local-Solr-Spatial-Search-and-LatLonType-clarification-tp1609570p1614309.html
Sent from the Solr - User mailing list archive at Nabble.com.


Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread webdev1977

I have been reading through all the jira issues and patches, as well as the
wikis and I still have a few things that are not clear to me. 

I am currently running with Solr 1.4.1 and using Nutch for my crawling. 
Everything is working great, I am using a Nutch plugin to add lat long
information, I just don't know if it is possible to do what I am wanting to
do. 

1.  I noticed that it said that the type of LatLongType can not be
mulitvalued. Does that mean that I can not have multiple lat/lon values for
one document.  If so, that would be quite a limitation.  I have an average
of 10 geotags per document. 

2. Is LocalSolr and SpatialSearch the same thing?   

3. If I did want to use the LatLonType with the BBOX filter,  where would I
go to get a patch for 1.4.1? Is it even possible to patch 1.4.1 or do I have
to go to an entirely different dev version of Solr? 

Thanks for your input!!! 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Local-Solr-Spatial-Search-and-LatLonType-clarification-tp1609570p1609570.html
Sent from the Solr - User mailing list archive at Nabble.com.


LocalSolr, Spatial Search, LatLonType clarification

2010-09-30 Thread webdev1977

I have been reading through all the jira issues and patches, as well as the
wikis and I still have a few things that are not clear to me. 

I am currently running with Solr 1.4.1 and using Nutch for my crawling. 
Everything is working great, I am using a Nutch plugin to add lat long
information, I just don't know if it is possible to do what I am wanting to
do.

1.  I noticed that it said that the type of LatLongType can not be
mulitvalued. Does that mean that I can not have multiple lat/lon values for
one document.  If so, that would be quite a limitation.  I have an average
of 10 geotags per document.

2. Is LocalSolr and SpatialSearch the same thing?  

3. If I did want to use the LatLonType with the BBOX filter,  where would I
go to get a patch for 1.4.1? Is it even possible to patch 1.4.1 or do I have
to go to an entirely different dev version of Solr? 

Thanks for your input!!! 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/LocalSolr-Spatial-Search-LatLonType-clarification-tp1609043p1609043.html
Sent from the Solr - User mailing list archive at Nabble.com.