HI everyone,

 

I'm new which won't be hard to figure out after I ask this question:

 

I use Drupal/Solr/Nutch

 

http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.
xml?view=markup

 

Solr specific:

How do I re-index for specific content only? I am starting  a legal index
specifically geared for law students and lawyers. I am crawling law related
sites but I really don't want to index law firms, just the law content on
places like:

http://www.ecasebriefs.com/blog/law/

http://www.lawnix.com/cases/cases-index/

http://www.oyez.org/

http://www.4lawnotes.com/

http://www.docstoc.com/documents/education/law-school/case-briefs

http://www.lawschoolcasebriefs.com/

http://dictionary.findlaw.com <http://dictionary.findlaw.com/> 

 

As I was saying, while crawling I get all kinds of extrinsic information put
into the Solr index. How do I combat that?

 

I am assuming (cough) that I can do this but I am really at a loss as to
where I start to look to get this done. I prefer to learn and I defiantly
don't want to waste anyone's time.

 

Non-Solr Specific

Does anyone here help with nutch or is this Solr only?

 

I am sorry if I am asking elementary questions and am asking in the wrong
place. I just need to be pointed to the right place. I'm sort of
lost.(imagine that.) 

 

Thanks

 

Eric

 

 

 

Reply via email to