My guess is that the file does not have the correct permissions while being copied.
[noland@localhost cp-test]$ cp -p test-0 test-1 & sleep 0.1; ls -al test* [1] 18780 -rw-rw-r-- 1 noland noland 1048576000 Nov 19 17:25 test-0 -rw------- 1 noland noland 52334592 Nov 19 17:27 test-1 For large files, it probably makes sense to copy the file in as .file and then rename it to file. Brock On Mon, Nov 19, 2012 at 5:04 PM, Patrick Wendell <[email protected]> wrote: > The spooling source gets a directory listing, then reads each file, then > renames it to X.COMPLETED. Is it possible some other process deleted that > file between when Flume listed the directory and when it tried to open the > file? Otherwise, I'm confused why the file would not be present in the > listing you give here. > > > On Mon, Nov 19, 2012 at 6:03 PM, Patrick Wendell <[email protected]> wrote: >> >> Hey Dan, >> >> You say that it seems like Flume has already processed the log... why do >> you think that? >> >> When you listed the directory contents I don't see the original or the >> COMPLETED version of the file that Flume is complaining about: >> >> /clickstream.log-2012-11-17-1353163623 >> >> doesn't appear in the >> >> /mnt/flume/clickstream/ >> >> directory listing anywhere. >> >> >> On Mon, Nov 19, 2012 at 2:33 PM, Dan Young <[email protected]> wrote: >>> >>> Hello Brock, >>> >>> It seems like we get this message each time that logrotate runs and is in >>> the process of copying the file to the SpoolingDirectory. It seems that >>> Flume starts reading the file as soon as it shows up in the >>> SpoolingDirectory..... Maybe it's trying to read the file while it's still >>> being written to???? >>> >>> 2012-11-19 19:27:27,924 (pool-12-thread-1) [WARN - >>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:328)] >>> Could not find file: >>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239 >>> java.io.FileNotFoundException: >>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239 (Permission >>> denied) >>> at java.io.FileInputStream.open(Native Method) >>> at java.io.FileInputStream.<init>(FileInputStream.java:138) >>> at java.io.FileReader.<init>(FileReader.java:72) >>> at >>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322) >>> at >>> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172) >>> at >>> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> at >>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) >>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>> at java.lang.Thread.run(Thread.java:722) >>> >>> >>> >>> >>> On Sat, Nov 17, 2012 at 9:15 AM, Brock Noland <[email protected]> wrote: >>>> >>>> Ok, do you mind sharing your log rotate config to see if we can >>>> reproduce? >>>> >>>> -- >>>> Brock Noland >>>> Sent with Sparrow >>>> >>>> On Saturday, November 17, 2012 at 10:01 AM, Dan Young wrote: >>>> >>>> Hey Brock, >>>> >>>> No I have not modified the conf while the agent was running. >>>> >>>> /mnt/flume is local. Note that this is running on an ec2 instance and >>>> the disk is the ephemeral drive, not EBS. >>>> >>>> Regards , >>>> >>>> Dano >>>> >>>> On Nov 17, 2012 8:58 AM, "Brock Noland" <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> I highly doubt it's related to >>>> (https://issues.apache.org/jira/browse/FLUME-1721) but have you >>>> modified the configuration file since starting the agent? If so, can >>>> you restart the agent and see if the error continues? >>>> >>>> Also, is /mnt/flume local disk or NAS? >>>> >>>> Brock >>>> >>>> On Sat, Nov 17, 2012 at 9:02 AM, Dan Young <[email protected]> wrote: >>>> > First a bit of context, I'm using logrotate to monitor and copy (cp >>>> > -p) log >>>> > files to a flume spooling directory source. So every hour, logrotate >>>> > checks >>>> > for and copies a file from the source to the flume destination. I see >>>> > the >>>> > following warning message in the flume logs. >>>> > >>>> > >>>> > 17 Nov 2012 14:47:07,682 WARN [pool-10-thread-1] >>>> > (org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile:328) >>>> > - >>>> > Could not find file: >>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623 >>>> > java.io.FileNotFoundException: >>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623 >>>> > (Permission >>>> > denied) >>>> > at java.io.FileInputStream.open(Native Method) >>>> > at java.io.FileInputStream.<init>(FileInputStream.java:138) >>>> > at java.io.FileReader.<init>(FileReader.java:72) >>>> > at >>>> > >>>> > org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322) >>>> > at >>>> > >>>> > org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172) >>>> > at >>>> > >>>> > org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135) >>>> > at >>>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> > at >>>> > >>>> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) >>>> > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) >>>> > at >>>> > >>>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) >>>> > at >>>> > >>>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>>> > at >>>> > >>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>>> > at >>>> > >>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>>> > at java.lang.Thread.run(Thread.java:722) >>>> > >>>> > >>>> > Although it appears that Flume processes the log, I'm curious why I''m >>>> > seeing this and if I have anything with permissions incorrect? >>>> > >>>> > >>>> > >>>> > Here's the permissions: >>>> > >>>> > source log directory under /var/log: >>>> > drwxrwxr-x 2 ubuntu ubuntu 4096 Nov 17 14:47 clickstream >>>> > >>>> > source files: >>>> > -rw-rw-r-- 1 ubuntu ubuntu 9055750 Nov 17 13:29 >>>> > clickstream.log-2012-11-17-1353158953.gz >>>> > -rw-rw-r-- 1 ubuntu ubuntu 13583565 Nov 17 14:17 >>>> > clickstream.log-2012-11-17-1353161821.gz >>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47 >>>> > clickstream.log-2012-11-17-1353163623 >>>> > -rw-rw-r-- 1 ubuntu ubuntu 65648336 Nov 17 14:52 clickstream.log >>>> > >>>> > flume source directory under /mnt/flume: >>>> > drwxrwxr-x 2 ubuntu ubuntu 4096 Nov 17 14:48 clickstream >>>> > >>>> > flume source files: >>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 13:29 >>>> > clickstream.log-2012-11-17-1353158953.COMPLETED >>>> > -rw-rw-r-- 1 ubuntu ubuntu 196945008 Nov 17 14:17 >>>> > clickstream.log-2012-11-17-1353161821.COMPLETED >>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47 >>>> > clickstream.log-2012-11-17-1353163623.COMPLETED >>>> > >>>> > Any insight would be appreciated. >>>> > >>>> > Regards, >>>> > >>>> > Dan >>>> >>>> >>>> >>>> -- >>>> Apache MRUnit - Unit testing MapReduce - >>>> http://incubator.apache.org/mrunit/ >>>> >>>> >>> >> > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
