If a folder used to hold seed url links files is created after the OS
is upgraded to Mojave 10.14, the .DS_Store file inside the folder is
NOT visible to Nutch. If a folder was created before the upgrade, the
.DS_Store file inside this old folder is visible to Nutch.
On Mon, Oct 29, 2018 at 4:09 AM Junqiang Zhang <[email protected]> wrote:
>
> Hello,
>
> I recently upgraded the OS of my MacBook Pro to Mojave 10.14. I run
> and debug Apache Nutch on this MacBookPro laptop. In the Apple macOS
> operating system, .DS_Store is a file that stores custom attributes of
> its containing folder. Before the upgrade of the OS, the .DS_store
> file inside Nutch seed folder was not visible to Nutch, and Nutch did
> not try to read seed urls from this file. After the upgrade, Nutch
> includes the .DS_Store file as a seed file, but Nutch thinks this
> input path does not exist.
>
> The relevant log shown on Terminal after I ran the command "bin/crawl
> -i -w 30s -s dealfar/urls dealfar/crawl 1" is copied below. How can I
> run and debug Nutch on Mojave 10.14? Can this issue be fixed by
> modifying Nutch source code? Thanks.
>
>
>
> XXXXXmbp:apache-nutch-1.15 XXXXX$ bin/crawl -i -w 30s -s dealfar/urls
> dealfar/crawl 1
> Time to wait (--wait) = 30 sec.
> Injecting seed URLs
> /Users/XXXXX/Documents/apache-nutch-1.15/bin/nutch inject
> dealfar/crawl/crawldb dealfar/urls
> Injector: starting at 2018-10-29 03:01:13
> Injector: crawlDb: dealfar/crawl/crawldb
> Injector: urlDir: dealfar/urls
> Injector: Converting injected urls to crawl db entries.
> Injecting seed URL file
> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/.DS_Store
> Injecting seed URL file
> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/seed.txt
> Injector job failed: Input path does not exist:
> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/.DS_Store
> Injector: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
> Input path does not exist:
> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/.DS_Store
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
> at 
> org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
> at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:436)
> at org.apache.nutch.crawl.Injector.run(Injector.java:570)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.crawl.Injector.main(Injector.java:535)
>
> Error running:
>   /Users/XXXXX/Documents/apache-nutch-1.15/bin/nutch inject
> dealfar/crawl/crawldb dealfar/urls
> Failed with exit value 255.

Reply via email to