index-replace: variable substitution?

2018-10-12 Thread Ryan Suarez
Greetings, I'm using binaries of nutch v1.15 with solr v7.3.1, and index-replace to copy a substring of the 'url' field to a new 'site' field. Here is the definition in my nutch-site.xml: index.replace.regexp urlmatch=.*www.mydomain.ca.*

Re: index-replace: variable substitution?

2018-10-24 Thread Ryan Suarez
e code of index-replace, it uses Java's > Matcher.replaceAll < > https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#replaceAll-java.lang.String- > > , so $1 (for example) should work. > > > > Yossi. > > > > > -Original Message

Re: Error Updating Solr

2019-02-28 Thread Ryan Suarez
Add this to your schema.xml: https://lucene.apache.org/solr/guide/6_6/dynamic-fields.html On Thu, 2019-02-28 at 16:45 -0600, Dave Beckstrom wrote: I'm getting much closer to getting Nutch and SOLR to play well together. (Ryan - thanks for your help on my last question. Your suggestion fixed

Re: Configuring Nutch to work with Solr?

2019-02-27 Thread Ryan Suarez
Try adding this to schema.xml: On Wed, 2019-02-27 at 15:49 -0600, Dave Beckstrom wrote: This message was sent from outside of Sheridan College. Please be careful when opening attachments, clicking links, or responding to requests for information. Hi Everyone, I'm a developer

Re: Direct Nutch crawler to use different SOLR index writer?

2019-03-02 Thread Ryan Suarez
# ./crawl -h -D A Java property to pass to Nutch calls /opt/nutch/bin/crawl -i -D solr.server.url=http://yourSOLR.com:8983/solr/collection1 /opt/nutch/bin/crawl -i -D solr.server.url=http://yourSOLR.com:8983/solr/collection2 On Fri, 2019-03-01 at 13:32 -0600, Dave Beckstrom wrote: This

Tracing crawled sites

2019-04-09 Thread Ryan Suarez
Greetings, We are running nutch v1.5 with SOLR v7.3.1 I would like to determine how a specific site was crawled. What were the parent links that the nutch crawler followed all the way back to the root? Could someone let me know what is the best way to accomplish this? regards, Ryan

Re: IllegalArgumentException: No form exists: user-login-form

2019-07-09 Thread Ryan Suarez
ok, so the error message is quite clear. There is no form on that link you provided with an id or name of 'user-login-form'. On Mon, 2019-07-08 at 22:39 -0400, Susheel Kumar wrote: > Hello Sebastian, > > Thanks for getting back. Here is the Login.html link which is > throwing no > form exists

multiple values encountered for non multiValued field keywords

2019-07-17 Thread Ryan Suarez
Greetings, I am trying to configure Nutch v1.15 and Solr v7.40 to index meta tags: https://wiki.apache.org/nutch/IndexMetatags However, I'm getting the following error: java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at

Re: multiple values encountered for non multiValued field keywords

2019-07-17 Thread Ryan Suarez
a. Note for Solr 7.x updating the schema.xml alone may be not sufficient, see https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search Let us know whether this works. Thanks! And we'll update the wiki page, resp. in the new wiki: https