Indexing HTML Metatags Nutch - SOLR

2020-01-18 Thread kra...@gds2.de
Hello, I have been trying this for several days without success. (nutch 1.16 - solr 7.3.1) I have followed this description: https://cwiki.apache.org/confluence/display/nutch/IndexMetatags Below I put my file nutch-site.xml I have created the core following this description:

Indexing HTML Metatags Nutch - SOLR

2020-01-18 Thread kra...@gds2.de
Hello, I have been trying this for several days without success. (nutch 1.16 - solr 7.3.1) I have followed this description: https://cwiki.apache.org/confluence/display/nutch/IndexMetatags Below I put my file nutch-site.xml I have created the core following this description:

Re: Nutch+Solr

2018-10-08 Thread Bineesh
This is solved. Nutch 1.15 have index-writers.xml file wherein we can pass the UN/PWD for indexing to solr. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Nutch+Solr

2018-10-03 Thread Terry Steichen
Bineesh, I don't use Nutch, so don't know if this is relevant, but I've had similar-sounding failures in doing and restoring backups.  The solution for me was to deactivate authentication while the backup was being done, and then activate it again afterwards.  Then everything was restored

Nutch+Solr

2018-10-03 Thread Bineesh
Hello, We use Solr 7.3.1 and Nutch 1.15 We've placed the authentication for our solr cloud setup using the basic auth plugin ( login details -> solr/SolrRocks) For the nutch to index data to solr, below properties added to nutch-sitexml file solr.auth true Whether to enable HTTP

Nutch + Solr - Indexer causes java.lang.OutOfMemoryError: Java heap space

2014-09-07 Thread glumet
a server side commit. /description /property -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-Solr-Indexer-causes-java-lang-OutOfMemoryError-Java-heap-space-tp4157308.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: document id in nutch/solr

2013-06-24 Thread alxsss
Another way of overriding nutch fields is to modify solrindex-mapping.xml file. hth Alex. -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Sun, Jun 23, 2013 12:04 pm Subject: Re: document id in nutch/solr Add

Re: document id in nutch/solr

2013-06-23 Thread Joe Zhang
Can somebody help with this one, please? On Fri, Jun 21, 2013 at 10:36 PM, Joe Zhang smartag...@gmail.com wrote: A quite standard configuration of nutch seems to autoamtically map url to id. Two questions: - Where is such mapping defined? I can't find it anywhere in nutch-site.xml or

Re: document id in nutch/solr

2013-06-23 Thread Jack Krupansky
- From: Joe Zhang Sent: Sunday, June 23, 2013 2:35 PM To: solr-user@lucene.apache.org Subject: Re: document id in nutch/solr Can somebody help with this one, please? On Fri, Jun 21, 2013 at 10:36 PM, Joe Zhang smartag...@gmail.com wrote: A quite standard configuration of nutch seems

document id in nutch/solr

2013-06-21 Thread Joe Zhang
A quite standard configuration of nutch seems to autoamtically map url to id. Two questions: - Where is such mapping defined? I can't find it anywhere in nutch-site.xml or schema.xml. The latter does define the id field as well as its uniqueness, but not the mapping. - Given that nutch nutch has

spellchecking in nutch solr

2011-09-01 Thread alxsss
Hello, I have tried to implement spellchecker based on index in nutch-solr by adding spell field to schema.xml and making it a copy from content field. However, this increased data folder size twice and spell filed as a copy of content field appears in xml feed which is not necessary

Assistance required fine-tuning nutch/solr - (paid work)

2010-11-12 Thread Jean-Luc
I require the expertise of a developer who can assist with fine-tuning my nutch/solr setup. I have the basics working but I think I probably need a custom nutch plugin written. If you're interested please contact me: jeanluct [at] gmail . com Hope it's ok to post this here - I'm not a recruiter

Nutch/Solr

2010-09-07 Thread Yavuz Selim YILMAZ
I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz

Re: Nutch/Solr

2010-09-07 Thread Markus Jelsma
Depends on your version of Nutch. At least trunk and 1.1 obey the solrmapping.xml file in Nutch' configuration directory. I'd suggest you start with that mapping file and the Solr schema.xml file shipped with Nutch as it exactly matches with the mapping file. Just restart Solr with the new

Re: Nutch/Solr

2010-09-07 Thread Yavuz Selim YILMAZ
In fact, I used nutch 0.9 version, but thinking of passing the new version. If anybody did something like that, ı want to learn their experience. If indexing an xml file, there are specific fields and all of them are dependent among them, so duplicates don't happen. I want to extract specific

Re: Nutch/Solr

2010-09-07 Thread Markus Jelsma
You should: - definately upgrade to 1.1 (1.2 is on the way), and - subscribe to the Nutch mailing list for Nutch specific questions. On Tuesday 07 September 2010 10:36:58 Yavuz Selim YILMAZ wrote: In fact, I used nutch 0.9 version, but thinking of passing the new version. If anybody did

Re: Nutch - Solr latest?

2008-06-25 Thread Chris Hostetter
: Im curious, is there a spot / patch for the latest on Nutch / Solr : integration, Ive found a few pages (a few outdated it seems), it would be nice : (?) if it worked as a DataSource type to DataImportHandler, but not sure if : that fits w/ how it works. Either way a nice contrib patch the way

Nutch - Solr latest?

2008-06-24 Thread Jon Baer
Hi, Im curious, is there a spot / patch for the latest on Nutch / Solr integration, Ive found a few pages (a few outdated it seems), it would be nice (?) if it worked as a DataSource type to DataImportHandler, but not sure if that fits w/ how it works. Either way a nice contrib patch