Hi, I made the file invisible again. It works very well now. Thanks.
Best, Junqiang On Mon, Oct 29, 2018 at 8:08 PM Sebastian Nagel <[email protected]> wrote: > > Hi, > > thanks for the problem report. However, I would argue not handle such > specificic > cases inside Nutch, it makes the Nutch code extremely complex and requires > extra > efforts to be portable among operating systems. > > Why not just make the file invisible again? > > Or if this isn't possible: > - write all seeds into a single file and > - pass this single seed file to Injector > (the seed list can be both - a directory > or a single file) > > Best, > Sebastian > > On 10/28/18 9:58 PM, Junqiang Zhang wrote: > > If a folder used to hold seed url links files is created after the OS > > is upgraded to Mojave 10.14, the .DS_Store file inside the folder is > > NOT visible to Nutch. If a folder was created before the upgrade, the > > .DS_Store file inside this old folder is visible to Nutch. > > On Mon, Oct 29, 2018 at 4:09 AM Junqiang Zhang <[email protected]> > > wrote: > >> > >> Hello, > >> > >> I recently upgraded the OS of my MacBook Pro to Mojave 10.14. I run > >> and debug Apache Nutch on this MacBookPro laptop. In the Apple macOS > >> operating system, .DS_Store is a file that stores custom attributes of > >> its containing folder. Before the upgrade of the OS, the .DS_store > >> file inside Nutch seed folder was not visible to Nutch, and Nutch did > >> not try to read seed urls from this file. After the upgrade, Nutch > >> includes the .DS_Store file as a seed file, but Nutch thinks this > >> input path does not exist. > >> > >> The relevant log shown on Terminal after I ran the command "bin/crawl > >> -i -w 30s -s dealfar/urls dealfar/crawl 1" is copied below. How can I > >> run and debug Nutch on Mojave 10.14? Can this issue be fixed by > >> modifying Nutch source code? Thanks. > >> > >> > >> > >> XXXXXmbp:apache-nutch-1.15 XXXXX$ bin/crawl -i -w 30s -s dealfar/urls > >> dealfar/crawl 1 > >> Time to wait (--wait) = 30 sec. > >> Injecting seed URLs > >> /Users/XXXXX/Documents/apache-nutch-1.15/bin/nutch inject > >> dealfar/crawl/crawldb dealfar/urls > >> Injector: starting at 2018-10-29 03:01:13 > >> Injector: crawlDb: dealfar/crawl/crawldb > >> Injector: urlDir: dealfar/urls > >> Injector: Converting injected urls to crawl db entries. > >> Injecting seed URL file > >> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/.DS_Store > >> Injecting seed URL file > >> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/seed.txt > >> Injector job failed: Input path does not exist: > >> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/.DS_Store > >> Injector: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: > >> Input path does not exist: > >> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/.DS_Store > >> at > >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) > >> at > >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) > >> at > >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) > >> at > >> org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115) > >> at > >> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) > >> at > >> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) > >> at > >> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) > >> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) > >> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:422) > >> at > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > >> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) > >> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) > >> at org.apache.nutch.crawl.Injector.inject(Injector.java:436) > >> at org.apache.nutch.crawl.Injector.run(Injector.java:570) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > >> at org.apache.nutch.crawl.Injector.main(Injector.java:535) > >> > >> Error running: > >> /Users/XXXXX/Documents/apache-nutch-1.15/bin/nutch inject > >> dealfar/crawl/crawldb dealfar/urls > >> Failed with exit value 255. >

