Hi,

I made the file invisible again. It works very well now. Thanks.

Best,
Junqiang
On Mon, Oct 29, 2018 at 8:08 PM Sebastian Nagel
<[email protected]> wrote:
>
> Hi,
>
> thanks for the problem report. However, I would argue not handle such 
> specificic
> cases inside Nutch, it makes the Nutch code extremely complex and requires 
> extra
> efforts to be portable among operating systems.
>
> Why not just make the file invisible again?
>
> Or if this isn't possible:
> - write all seeds into a single file and
> - pass this single seed file to Injector
>   (the seed list can be both - a directory
>    or a single file)
>
> Best,
> Sebastian
>
> On 10/28/18 9:58 PM, Junqiang Zhang wrote:
> > If a folder used to hold seed url links files is created after the OS
> > is upgraded to Mojave 10.14, the .DS_Store file inside the folder is
> > NOT visible to Nutch. If a folder was created before the upgrade, the
> > .DS_Store file inside this old folder is visible to Nutch.
> > On Mon, Oct 29, 2018 at 4:09 AM Junqiang Zhang <[email protected]> 
> > wrote:
> >>
> >> Hello,
> >>
> >> I recently upgraded the OS of my MacBook Pro to Mojave 10.14. I run
> >> and debug Apache Nutch on this MacBookPro laptop. In the Apple macOS
> >> operating system, .DS_Store is a file that stores custom attributes of
> >> its containing folder. Before the upgrade of the OS, the .DS_store
> >> file inside Nutch seed folder was not visible to Nutch, and Nutch did
> >> not try to read seed urls from this file. After the upgrade, Nutch
> >> includes the .DS_Store file as a seed file, but Nutch thinks this
> >> input path does not exist.
> >>
> >> The relevant log shown on Terminal after I ran the command "bin/crawl
> >> -i -w 30s -s dealfar/urls dealfar/crawl 1" is copied below. How can I
> >> run and debug Nutch on Mojave 10.14? Can this issue be fixed by
> >> modifying Nutch source code? Thanks.
> >>
> >>
> >>
> >> XXXXXmbp:apache-nutch-1.15 XXXXX$ bin/crawl -i -w 30s -s dealfar/urls
> >> dealfar/crawl 1
> >> Time to wait (--wait) = 30 sec.
> >> Injecting seed URLs
> >> /Users/XXXXX/Documents/apache-nutch-1.15/bin/nutch inject
> >> dealfar/crawl/crawldb dealfar/urls
> >> Injector: starting at 2018-10-29 03:01:13
> >> Injector: crawlDb: dealfar/crawl/crawldb
> >> Injector: urlDir: dealfar/urls
> >> Injector: Converting injected urls to crawl db entries.
> >> Injecting seed URL file
> >> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/.DS_Store
> >> Injecting seed URL file
> >> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/seed.txt
> >> Injector job failed: Input path does not exist:
> >> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/.DS_Store
> >> Injector: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
> >> Input path does not exist:
> >> file:/Users/XXXXX/Documents/apache-nutch-1.15/dealfar/urls/.DS_Store
> >> at 
> >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> >> at 
> >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
> >> at 
> >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
> >> at 
> >> org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115)
> >> at 
> >> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
> >> at 
> >> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
> >> at 
> >> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
> >> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> >> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
> >> at java.security.AccessController.doPrivileged(Native Method)
> >> at javax.security.auth.Subject.doAs(Subject.java:422)
> >> at 
> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
> >> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> >> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
> >> at org.apache.nutch.crawl.Injector.inject(Injector.java:436)
> >> at org.apache.nutch.crawl.Injector.run(Injector.java:570)
> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >> at org.apache.nutch.crawl.Injector.main(Injector.java:535)
> >>
> >> Error running:
> >>   /Users/XXXXX/Documents/apache-nutch-1.15/bin/nutch inject
> >> dealfar/crawl/crawldb dealfar/urls
> >> Failed with exit value 255.
>

Reply via email to