Re: Unable to use nutch 2.3 crawl script for MySQL, Mongo, or Cassandra

Drulea, Sherban Wed, 30 Sep 2015 17:41:50 -0700

Uncommenting <copyField source="rawcontent" dest="text”/> in schema.xml
fixed the issue with SOLR.


Now there are no error messages but also no parsing :(.

My seed.txt:
---------------------------------------------------------------------------
-------
http://intranet.rand.org/eprm/rand-initiated-research/proposals/fy2015/inde
x.html
http://intranet.rand.org/eprm/rand-initiated-research/2015.html
http://intranet.rand.org/eprm/rand-initiated-research/faq.html
http://intranet.rand.org/eprm/rand-initiated-research/index.html
---------------------------------------------------------------------------
-------



My nutch-site.xml:
---------------------------------------------------------------------------
-------

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <property>
        <name>http.agent.name</name>
        <value>nutch Mongo Solr Crawler</value>
    </property>

    <property>
        <name>storage.data.store.class</name>
        <value>org.apache.gora.mongodb.store.MongoStore</value>
        <description>Default class for storing data</description>
    </property>
    
    <property>
        <name>plugin.includes</name>
        
<value>protocol-(http|httpclient)|urlfilter-regex|parse-(html|tika)|index-(
basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic|indexer-solr</v
alue>
        <description>Regular expression naming plugin directory names to
         include.  Any plugin not matching this expression is excluded.
         In any case you need at least include the nutch-extensionpoints
plugin. By
         default Nutch includes crawling just HTML and plain text via HTTP,
         and basic indexing and search plugins. In order to use HTTPS
please enable 
         protocol-httpclient, but be aware of possible intermittent
problems with the 
         underlying commons-httpclient library.
         </description>
   </property>
    
</configuration>
---------------------------------------------------------------------------
-------




My regex-urlfilter.txt:
---------------------------------------------------------------------------
-------

# skip file: ftp: and mailto: urls
-^(file|ftp|mailto):

# skip image and other suffixes we can't yet parse
# for a more extensive coverage use the urlfilter-suffix plugin
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP
|ppt|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bm
p|BMP|js|JS)$

# skip URLs containing certain characters as probable queries, etc.
-[?*!@=]

# skip URLs with slash-delimited segment that repeats 3+ times, to break
loops
-.*(/[^/]+)/[^/]+\1/[^/]+\1/

# accept anything else
+.
—————————————————————————————————————————


I see these warnings in my hadoop.log:

2015-09-30 17:32:53,466 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable

015-09-30 17:32:54,571 WARN  conf.Configuration -
file:/tmp/hadoop-sdrulea/mapred/staging/sdrulea1728069154/.staging/job_loca
l1728069154_0001/job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2015-09-30 17:32:54,573 WARN  conf.Configuration -
file:/tmp/hadoop-sdrulea/mapred/staging/sdrulea1728069154/.staging/job_loca
l1728069154_0001/job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2015-09-30 17:32:54,652 WARN  conf.Configuration -
file:/tmp/hadoop-sdrulea/mapred/local/localRunner/sdrulea/job_local17280691
54_0001/job_local1728069154_0001.xml:an attempt to override final
parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2015-09-30 17:32:54,654 WARN  conf.Configuration -
file:/tmp/hadoop-sdrulea/mapred/local/localRunner/sdrulea/job_local17280691
54_0001/job_local1728069154_0001.xml:an attempt to override final
parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.


Any ideas?



On 9/30/15, 5:21 PM, "Drulea, Sherban" <[email protected]> wrote:

>Hi Lewis,
>
>
>On 9/30/15, 11:05 AM, "Lewis John Mcgibbney" <[email protected]>
>wrote:
>
>>Hi Sherban,
>>
>>On Wed, Sep 30, 2015 at 6:46 AM, <[email protected]>
>>wrote:
>>
>>>
>>> I tried with SOLR 4.9.1.
>>>
>>
>>OK. As I said Solr 4.6 is supported but never mind.
>
>OK. I¹m using SOLR 4.6.0.
>
>I replaced solr-4.6.0/example/solr/collection1/conf/schema.xml with file
>from https://github.com/apache/nutch/blob/2.x/conf/schema.xml.
>
>When I start SOLR 4.6.0. With "java -jar start.jar², I get this error:
>1094 [coreLoadExecutor-3-thread-1] INFO
>org.apache.solr.update.SolrIndexConfig   IndexWriter infoStream solr
>logging is enabled
>1097 [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.core.SolrConfig
> Using Lucene MatchVersion: LUCENE_46
>1160 [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.core.Config  
>Loaded SolrConfig: solrconfig.xml
>1164 [coreLoadExecutor-3-thread-1] INFO
>org.apache.solr.schema.IndexSchema   Reading Solr Schema from schema.xml
>1176 [coreLoadExecutor-3-thread-1] INFO
>org.apache.solr.schema.IndexSchema   [collection1] Schema name=nutch
>1241 [coreLoadExecutor-3-thread-1] INFO
>org.apache.solr.schema.IndexSchema   default search field in schema is
>text
>1242 [coreLoadExecutor-3-thread-1] INFO
>org.apache.solr.schema.IndexSchema   query parser default operator is OR
>1242 [coreLoadExecutor-3-thread-1] INFO
>org.apache.solr.schema.IndexSchema   unique key field: id
>1243 [coreLoadExecutor-3-thread-1] ERROR
>org.apache.solr.core.CoreContainer   Unable to create core: collection1
>org.apache.solr.common.SolrException: copyField source :'rawcontent' is
>not a glob and doesn't match any explicit field or dynamicField.. Schema
>file is 
>/Users/sdrulea/Downloads/solr-4.6.0/example/solr/collection1/schema.xml
>       at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:608)
>       at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:166)
>       at 
>org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:5
>5
>)
>       at 
>org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFact
>o
>ry.java:69)
>       at 
>org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:554)
>       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:592)
>       at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:271)
>       at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1
>142)
>       at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:
>617)
>       at java.lang.Thread.run(Thread.java:745)
>Caused by: org.apache.solr.common.SolrException: copyField source
>:'rawcontent' is not a glob and doesn't match any explicit field or
>dynamicField.
>       at 
>org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:855)
>       at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:592)
>       ... 13 more
>1245 [coreLoadExecutor-3-thread-1] ERROR
>org.apache.solr.core.CoreContainer  
>null:org.apache.solr.common.SolrException: Unable to create core:
>collection1
>       at 
>org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:977)
>       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:601)
>       at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:271)
>       at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1
>142)
>       at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:
>617)
>       at java.lang.Thread.run(Thread.java:745)
>Caused by: org.apache.solr.common.SolrException: copyField source
>:'rawcontent' is not a glob and doesn't match any explicit field or
>dynamicField.. Schema file is
>/Users/sdrulea/Downloads/solr-4.6.0/example/solr/collection1/schema.xml
>       at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:608)
>       at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:166)
>       at 
>org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:5
>5
>)
>       at 
>org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFact
>o
>ry.java:69)
>       at 
>org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:554)
>       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:592)
>       ... 8 more
>Caused by: org.apache.solr.common.SolrException: copyField source
>:'rawcontent' is not a glob and doesn't match any explicit field or
>dynamicField.
>       at 
>org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:855)
>       at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:592)
>       ... 13 more
>
>1247 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  
>user.dir=/Users/sdrulea/Downloads/solr-4.6.0/example
>1247 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  
>SolrDispatchFilter.init() done
>1263 [main] INFO  org.eclipse.jetty.server.AbstractConnector   Started
>[email protected]:8983
>
>
>The only changes I made to schema.xml were to comment out lines with
>³protwords.txt² as the tutorial suggested. Has anyone tested the 2.3.1
>schema.xml with SOLR 4.6.1?
>
>>
>>
>>>
>>> I copied /release-2.3.1/runtime/local/conf/schema.xml to
>>> solr-4.9.1/example/solr/collection1/conf/schema.xml
>>>
>>
>>Good.
>>
>>
>>>
>>> Result of /release-2.3.1/runtime/local/bin/crawl urls method_centers
>>> http://localhost:8983/solr 2
>>>
>>>
>>> InjectorJob: total number of urls rejected by filters: 1
>>>
>>
>>Notice that you regex urlfilter is rejecting one of your seed URLs.
>
>One of my original URLs ended with ³/". I added index.html and that fixed
>the rejection.
>
>InjectorJob: total number of urls rejected by filters: 0
>InjectorJob: total number of urls injected after normalization and
>filtering: 11
>
>
>>
>>
>>> InjectorJob: total number of urls injected after normalization and
>>> filtering: 5
>>>
>>
>>[...snip]
>>
>>GeneratorJob: generated batch id: 1443556518-1067112789 containing 0 URLs
>>> Generate returned 1 (no new segments created)
>>> Escaping loop: no more URLs to fetch now
>>>
>>> There are 6 URLs in my urls/seeds.txt file. Why does it say 0 URLs?
>>>
>>
>>1 was rejected as explained above. Additionally, it seems like there is
>>also an error fetching your seeds and parsing out hyperlinks. I would
>>encourage you to check the early stages of configuring and prepping your
>>crawler. Some configuration is incorrect... possibly more problems with
>>your regex urlfilters.
>
>My regex-urlfilter.txt is unmodified:
># skip file: ftp: and mailto: urls
>-^(file|ftp|mailto):
>
># skip image and other suffixes we can't yet parse
># for a more extensive coverage use the urlfilter-suffix plugin
>-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZI
>P
>|ppt|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|b
>m
>p|BMP|js|JS)$
>
># skip URLs containing certain characters as probable queries, etc.
>-[?*!@=]
>
># skip URLs with slash-delimited segment that repeats 3+ times, to break
>loops
>-.*(/[^/]+)/[^/]+\1/[^/]+\1/
>
># accept anything else
>+.
>
>
>I copied plugin.includes to local/conf/nutch-site.xml. I aded httpclient &
>indexer-solr
><property>
>        <name>plugin.includes</name>
>        
><value>protocol-(http|httpclient)|urlfilter-regex|parse-(html|tika)|index-
>(
>basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic|indexer-solr</
>v
>alue>
>
>        <description>Regular expression naming plugin directory names to
>         include.  Any plugin not matching this expression is excluded.
>         In any case you need at least include the nutch-extensionpoints
>plugin. By
>         default Nutch includes crawling just HTML and plain text via
>HTTP,
>         and basic indexing and search plugins. In order to use HTTPS
>please enable 
>         protocol-httpclient, but be aware of possible intermittent
>problems with the 
>         underlying commons-httpclient library.
>         </description>
>   </property>
>
>
>Nutch still doesn¹t parse any links. Any ideas?
>
>InjectorJob: total number of urls injected after normalization and
>filtering: 11
>/Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch fetch -D
>mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
>mapred.reduce.tasks.speculative.execution=false -D
>mapred.map.tasks.speculative.execution=false -D
>mapred.compress.map.output=true -D fetcher.timelimit.mins=180
>1443657910-4394 -crawlId method_centers -threads 50
>FetcherJob: starting at 2015-09-30 17:05:14
>FetcherJob: batchId: 1443657910-4394
>FetcherJob: threads: 50
>FetcherJob: parsing: false
>FetcherJob: resuming: false
>FetcherJob : timelimit set for : 1443668714323
>Using queue mode : byHost
>Fetcher: threads: 50
>QueueFeeder finished: total 0 records. Hit by time limit :0
>Š.
>Fetcher: throughput threshold sequence: 5
>0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs
>in 0 queues
>
>
>
>
>-activeThreads=0
>Using queue mode : byHost
>Fetcher: threads: 50
>QueueFeeder finished: total 0 records. Hit by time limit :0
>Š.
>
>-finishing thread FetcherThread49, activeThreads=0
>0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs
>in 0 queues
>
>
>Parsing : 
>/Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch parse -D
>mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
>mapred.reduce.tasks.speculative.execution=false -D
>mapred.map.tasks.speculative.execution=false -D
>mapred.compress.map.output=true -D
>mapred.skip.attempts.to.start.skipping=2 -D
>mapred.skip.map.max.skip.records=1 1443657910-4394 -crawlId method_centers
>ParserJob: starting at 2015-09-30 17:05:27
>ParserJob: resuming:   false
>ParserJob: forced reparse:     false
>ParserJob: batchId:    1443657910-4394
>ParserJob: success
>ParserJob: finished at 2015-09-30 17:05:29, time elapsed: 00:00:02
>CrawlDB update for method_centers
>/Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch updatedb -D
>mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
>mapred.reduce.tasks.speculative.execution=false -D
>mapred.map.tasks.speculative.execution=false -D
>mapred.compress.map.output=true 1443657910-4394 -crawlId method_centers
>DbUpdaterJob: starting at 2015-09-30 17:05:30
>DbUpdaterJob: batchId: 1443657910-4394
>DbUpdaterJob: finished at 2015-09-30 17:05:32, time elapsed: 00:00:02
>Indexing method_centers on SOLR index -> http://localhost:8983/solr
>/Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch index -D
>mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
>mapred.reduce.tasks.speculative.execution=false -D
>mapred.map.tasks.speculative.execution=false -D
>mapred.compress.map.output=true -D
>solr.server.url=http://localhost:8983/solr -all -crawlId method_centers
>
>
>
>
>>
>>
>>>
>>>
>>> The index job worked but there¹s no data in SOLR. Is there a known good
>>> version of SOLR that works with 2.3.1 schema.xml? Are the tutorial
>>> instructions still valid?
>>>
>>
>>Not it did not. It failed. Look at the hadoop.log.
>>Also please look at your solr.log, it will provide you with better
>>insight
>>into what is wrong with your Solr server and what messages are failing.
>>Thanks
>
>The nutch schema.xml doesn¹t work on my SOLR 4.6.0:
>
>IndexingJob: starting
>No IndexWriters activated - check your configuration
>
>IndexingJob: done.
>SOLR dedup -> http://localhost:8983/solr
>/Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch solrdedup -D
>mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
>mapred.reduce.tasks.speculative.execution=false -D
>mapred.map.tasks.speculative.execution=false -D
>mapred.compress.map.output=true http://localhost:8983/solr
>Exception in thread "main"
>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>Expected content type application/octet-stream but got
>text/html;charset=ISO-8859-1. <html>
><head>
><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
><title>Error 500 {msg=SolrCore 'collection1' is not available due to init
>failure: copyField source :'rawcontent' is not a glob and doesn't match
>any explicit field or dynamicField.. Schema file is
>/Users/sdrulea/Downloads/solr-4.6.0/example/solr/collection1/schema.xml,tr
>a
>ce=org.apache.solr.common.SolrException: SolrCore 'collection1' is not
>available due to init failure: copyField source :'rawcontent' is not a
>glob and doesn't match any explicit field or dynamicField.. Schema file is
>/Users/sdrulea/Downloads/solr-4.6.0/example/solr/collection1/schema.xml
>       at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:818)
>       at 
>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
>a
>:297)
>       at 
>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
>a
>:197)
>       at 
>org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl
>e
>r.java:1419)
>       at 
>org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>       at 
>org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1
>3
>7)
>       at 
>org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557
>)
>       at 
>org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.ja
>v
>a:231)
>       at 
>org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.ja
>v
>a:1075)
>       at 
>org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>       at 
>org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.jav
>a
>:193)
>       at 
>org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.jav
>a
>:1009)
>       at 
>org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1
>3
>5)
>       at 
>org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHa
>n
>dlerCollection.java:255)
>       at 
>org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollectio
>n
>.java:154)
>       at 
>org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java
>:
>116)
>       at org.eclipse.jetty.server.Server.handle(Server.java:368)
>       at 
>org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttp
>C
>onnection.java:489)
>       at 
>org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttp
>C
>onnection.java:53)
>       at 
>org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHtt
>p
>Connection.java:942)
>       at 
>org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerCompl
>e
>te(AbstractHttpConnection.java:1004)
>       at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>       at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>       at 
>org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnect
>i
>on.java:72)
>       at 
>org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketC
>o
>nnector.java:264)
>       at 
>org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.jav
>a
>:608)
>       at 
>org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java
>:
>543)
>       at java.lang.Thread.run(Thread.java:745)
>Caused by: org.apache.solr.common.SolrException: copyField source
>:'rawcontent' is not a glob and doesn't match any explicit field or
>dynamicField.. Schema file is
>/Users/sdrulea/Downloads/solr-4.6.0/example/solr/collection1/schema.xml
>       at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:608)
>       at org.apache.solr.schema.IndexSchema.&lt;init&gt;(IndexSchema.java:166)
>       at 
>org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:5
>5
>)
>       at 
>org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFact
>o
>ry.java:69)
>       at 
>org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:554)
>       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:592)
>       at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:271)
>       at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1
>142)
>       at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:
>617)
>       ... 1 more
>Caused by: org.apache.solr.common.SolrException: copyField source
>:'rawcontent' is not a glob and doesn't match any explicit field or
>dynamicField.
>       at 
>org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:855)
>       at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:592)
>       ... 13 more
>
>
>
>Cheers,
>Sherban
>
>
>__________________________________________________________________________
>
>This email message is for the sole use of the intended recipient(s) and
>may contain confidential information. Any unauthorized review, use,
>disclosure or distribution is prohibited. If you are not the intended
>recipient, please contact the sender by reply email and destroy all copies
>of the original message.
>

Re: Unable to use nutch 2.3 crawl script for MySQL, Mongo, or Cassandra

Reply via email to