[jira] [Commented] (FLUME-2458) Separate hdfs tmp directory for flume hdfs sink

2016-07-21 Thread Jeff Field (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388701#comment-15388701
 ] 

Jeff Field commented on FLUME-2458:
---

Unfortunately, distcp copies dotfiles, unless you regex exclude them but the 
regex exclude carries over even when a file is renamed apparently, so distcp 
just will never copy the file (until you kill the snapshots and have it do a 
non-snapshot run).

That is a good solutions for hiding things from things that aren't distcp 
though, yeah.

> Separate hdfs tmp directory for flume hdfs sink
> ---
>
> Key: FLUME-2458
> URL: https://issues.apache.org/jira/browse/FLUME-2458
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0.1
>Reporter: Sverre Bakke
>Assignee: Neerja Khattar
>Priority: Minor
> Attachments: FLUME-2458.patch, patch-2458.txt
>
>
> The current HDFS sink will write temporary files to the same directory as the 
> final file will be stored. This is a problem for several reasons:
> 1) File moving
> When mapreduce fetches a list of files to be processed and then processes 
> files that are then gone (i.e. are moved from .tmp to  whatever final name it 
> is suppose to have), then the mapreduce job will crash.
> 2) File type
> When mapreduce decides how to process files, then it looks at files 
> extension. If using compressed files, then it will decompress it for you. If 
> the file has a .tmp file extension (in the same folder) then it will treat a 
> compressed file as an uncompressed files, thus breaking the results of the 
> mapreduce job.
> I propose that the sink gets an optional tmp path for storing these files to 
> avoid these issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50232: FLUME-2619: Spooldir source does not log channel exceptions

2016-07-21 Thread Denes Arvay

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50232/#review143088
---




flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java 
(line 211)


nit (and I know that it was the same before): as a getter for a boolean 
value it'd be better to start with `is` (or if it sounds better `get`) prefix.



flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java 
(line 250)


My concern with this change is that if a `ChannelException` is thrown then 
it'll end up in the outer `catch (Throwable t)` (line 267) block and it will be 
considered as fatal error which isn't the case currently.

As far as I understood the Jira issue the main goal is to be able to make 
distinction between `ChannelFullException` and other `ChannelException`s and in 
the latter case log the original exception but the current behaviour shouldn't 
be changed.


- Denes Arvay


On July 20, 2016, 1:25 p.m., Balázs Donát Bessenyei wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50232/
> ---
> 
> (Updated July 20, 2016, 1:25 p.m.)
> 
> 
> Review request for Flume, Denes Arvay and Attila Simon.
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> Spooldir assumes that any ChannelException means that the channel is full and 
> it does not log the exception message.
> 
> 
> Diffs
> -
> 
>   flume-ng-core/src/main/java/org/apache/flume/channel/ChannelProcessor.java 
> 1cce137 
>   
> flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java 
> d88cc1d 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java
>  82c5351 
> 
> Diff: https://reviews.apache.org/r/50232/diff/
> 
> 
> Testing
> ---
> 
> [INFO] Flume NG Core .. SUCCESS [08:04 
> min]
> 
> 
> Thanks,
> 
> Balázs Donát Bessenyei
> 
>



[jira] [Commented] (FLUME-2318) SpoolingDirectory is unable to handle empty files

2016-07-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLUME-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387420#comment-15387420
 ] 

Bessenyei Balázs Donát commented on FLUME-2318:
---

I have created a new reviewboard request: https://reviews.apache.org/r/50286/
(Rebasing it on trunk and some autoformat)

NOTE: this patch changes how SpoolingDirectory source works. See lines 433 and 
434 in reviewboard.

> SpoolingDirectory is unable to handle empty files
> -
>
> Key: FLUME-2318
> URL: https://issues.apache.org/jira/browse/FLUME-2318
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Muhammad Ehsan ul Haque
>Assignee: Bessenyei Balázs Donát
>Priority: Minor
>  Labels: easytest, patch
> Fix For: v1.7.0
>
> Attachments: FLUME-2318-0.patch, FLUME-2318-1.patch, 
> FLUME-2318-2.patch
>
>
> Empty files should be returned as an empty event instead of no event.
> h4. Scenario
> From the start consume files in this order
> # f1: File with data or empty file
> # f2: Empty File
> # No file in spooling directory
> h4. Expected Outcome
> # channel.take() should return event with f1 data.
> # channel.take() should return event with f2 data (empty data).
> # channel.take() should return null.
> h4. What happens
> # channel.take() returns event with f1 data.
> # channel.take() returns null.
> # Exception is raised when the SpoolDirectorySource thread tries to read 
> events from the ReliableSpoolingFileEventReader. Snippet of trace is
> 2014-02-09 15:46:35,832 (pool-1-thread-1) [INFO - 
> org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:346)]
>  Preparing to move file /tmp/1391957195572-0/file1 to 
> /tmp/1391957195572-0/file1.COMPLETED
> 2014-02-09 15:46:36,334 (pool-1-thread-1) [INFO - 
> org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:228)]
>  Last read was never committed - resetting mark position.
> 2014-02-09 15:46:36,335 (pool-1-thread-1) [INFO - 
> org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:346)]
>  Preparing to move file /tmp/1391957195572-0/file2 to 
> /tmp/1391957195572-0/file2.COMPLETED
> 2014-02-09 15:46:36,839 (pool-1-thread-1) [ERROR - 
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:252)]
>  FATAL: Spool Directory source null: { spoolDir: /tmp/1391957195572-0 }: 
> Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure 
> Flume to continue processing.
> java.lang.IllegalStateException: File should not roll when commit is 
> outstanding.
>   at 
> org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:225)
>   at 
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:224)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> h4. Unit Test
> In TestSpoolDirectorySource
> {code}
>   @Test
>   public void testWithEmptyFile2()
>   throws InterruptedException, IOException {
> Context context = new Context();
> File f1 = new File(tmpDir.getAbsolutePath() + "/file1");
> Files.write("some data".getBytes(), f1);
> File f2 = new File(tmpDir.getAbsolutePath() + "/file2");
> Files.write(new byte[0], f2);
> context.put(SpoolDirectorySourceConfigurationConstants.SPOOL_DIRECTORY,
> tmpDir.getAbsolutePath());
> Configurables.configure(source, context);
> source.start();
> Thread.sleep(10);
> for (int i=0; i<2; i++) {
>   Transaction txn = channel.getTransaction();
>   txn.begin();
>   Event e = channel.take();
>   txn.commit();
>   txn.close();
> }
> Transaction txn = channel.getTransaction();
> txn.begin();
> Assert.assertNull(channel.take());
> txn.commit();
> txn.close();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 50286: FLUME-2318: SpoolingDirectory is unable to handle empty files

2016-07-21 Thread Balázs Donát Bessenyei

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50286/
---

Review request for Flume, Denes Arvay and Attila Simon.


Repository: flume-git


Description
---

Empty files should be returned as an empty event instead of no event.


Diffs
-

  
flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java
 01381a5 
  
flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java
 82c5351 

Diff: https://reviews.apache.org/r/50286/diff/


Testing
---

TestSpoolDirectorySource tests run 12/12 green


Thanks,

Balázs Donát Bessenyei



[jira] [Updated] (FLUME-2958) Add ignorePattern for TaildirSource

2016-07-21 Thread Hu Liu, (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Liu, updated FLUME-2958:
---
Description: we have tried the TaildirSource and found that it's lack of 
ignorePattern specifying which files to ignore. I'm glad to work on it if 
anyone could assign it to me  (was: we have tried the TaildirSource and found 
that it's lack of ignorePattern specifying which files to ignore. I'm glad to 
work on it if anyone assign it to me)

> Add ignorePattern for TaildirSource
> ---
>
> Key: FLUME-2958
> URL: https://issues.apache.org/jira/browse/FLUME-2958
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.7.0
>Reporter: Hu Liu,
>
> we have tried the TaildirSource and found that it's lack of ignorePattern 
> specifying which files to ignore. I'm glad to work on it if anyone could 
> assign it to me



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2717) Add commons-io dependency into hadoop-2 profile to enable Flume 1.5 to support Hadoop 2.7

2016-07-21 Thread Lior Zeno (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387374#comment-15387374
 ] 

Lior Zeno commented on FLUME-2717:
--

Are we planning to support hadoop-2.7.0? Currently hadoop2.version is set to 
2.4.0.
I wonder why we don't have this issue with 2.4.0, as it depends on commons-io 
2.4, and the Charsets class was added to commons-io at 2.3 
(https://commons.apache.org/proper/commons-io/javadocs/api-2.4/index.html?org/apache/commons/io/Charsets.html).
 Is it because hadoop does not use this class in 2.4.0?

> Add commons-io dependency into hadoop-2 profile to enable Flume 1.5 to 
> support Hadoop 2.7
> -
>
> Key: FLUME-2717
> URL: https://issues.apache.org/jira/browse/FLUME-2717
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: sam liu
>Assignee: Xiang Li
> Fix For: v1.7.0
>
> Attachments: FLUME-2717-001.patch
>
>
> By default, for branch origin/flume-1.5, the hadoop2.version is 2.4.0. 
> However, if we want to use hadoop-2.7.0 in flume-1.5, some hadoop regarding 
> tests will fail with exception 'java.lang.NoClassDefFoundError: 
> org/apache/commons/io/Charsets'.
> The missed class Charsets is a new class in commons-io 2.4 jar file which is 
> invoked by hadoop-2.7.0, however flume-1.5 depends on commons-io 2.1 which 
> does not include the class 'Charsets'.
> Therefore the solution to enable flume-1.5 to support hadoop-2.7.0 is to add 
> commons-io 2.4 as a dependency into the hadoop-2 profile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2514) Some TestFileChannelRestart tests are extremely slow

2016-07-21 Thread Lior Zeno (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387340#comment-15387340
 ] 

Lior Zeno commented on FLUME-2514:
--

[~mpercy], do you manually resolve issues after commit or is there an automatic 
background process that does it?

> Some TestFileChannelRestart tests are extremely slow
> 
>
> Key: FLUME-2514
> URL: https://issues.apache.org/jira/browse/FLUME-2514
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Reporter: Santiago M. Mola
>Assignee: Santiago M. Mola
> Fix For: v1.7.0
>
> Attachments: FLUME-2215-0.patch, FLUME-2215-1.patch
>
>
> TestFileChannelRestart tests are really slow. For example, 
> testToggleCheckpointCompressionFromFalseToTrue and 
> testToggleCheckpointCompressionFromTrueToFalse take ~4 minutes each.
> Some of them could be made faster by using channels with lower capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2919) Upgrade the Solr version to 6.0.1

2016-07-21 Thread Lior Zeno (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lior Zeno updated FLUME-2919:
-
Fix Version/s: (was: v1.7.0)
   v1.8.0

> Upgrade the Solr version to 6.0.1
> -
>
> Key: FLUME-2919
> URL: https://issues.apache.org/jira/browse/FLUME-2919
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.6.0
>Reporter: Minoru Osuka
> Fix For: v1.8.0
>
> Attachments: FLUME-2919-1.patch, FLUME-2919-2.patch, FLUME-2919.patch
>
>
> Flume morphline-solr-sink is using Solr 4.3.0. Recently, Solr 6.0.1 has been 
> released. I propose to upgrade to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2797) SyslogTcpSource uses Deprecated Class + Deprecate Syslog Source

2016-07-21 Thread Lior Zeno (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387332#comment-15387332
 ] 

Lior Zeno commented on FLUME-2797:
--

+1 for Hari's suggestion. In our documentation we also mention that (the 
Multiport Syslog Source) "This is a newer, faster, multi-port capable version 
of the Syslog TCP source."

> SyslogTcpSource uses Deprecated Class + Deprecate Syslog Source
> ---
>
> Key: FLUME-2797
> URL: https://issues.apache.org/jira/browse/FLUME-2797
> Project: Flume
>  Issue Type: Bug
>Reporter: Otis Gospodnetic
>Priority: Minor
> Fix For: v1.7.0
>
> Attachments: FLUME-2797-0.patch
>
>
> From the mailing list:
> From Ashish:
> Source uses an deprecated class. Can you please file a JIRA for this?
> The fix is simple. In SyslogTcpSource Line#61, replace the
> CounterGroup usage with SourceCounter and make related changes in
> code. You can refer other Sources for details. Same is applicable for
> SyslogUDPSource.
> From Hari:
> I think the Syslog TCP source should be deprecated in favor of the Multiport 
> Syslog Source - that is more stable and gives better performance too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2340) Refactor to make room for Morphlines Elasticsearch Sink

2016-07-21 Thread Lior Zeno (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lior Zeno updated FLUME-2340:
-
Fix Version/s: (was: v1.8.0)
   v2.0.0

> Refactor to make room for Morphlines Elasticsearch Sink
> ---
>
> Key: FLUME-2340
> URL: https://issues.apache.org/jira/browse/FLUME-2340
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Reporter: Otis Gospodnetic
> Fix For: v2.0.0
>
>
> Right now there are some non-Solr-specific classes in 
> org.apache.flume.sink.solr.morphline  and everything assumes data will get 
> loaded into Solr.  This should be refactored to make it possible to use 
> Morphlines and send data to Elasticsearch, too, for example.
> See 
> http://search-hadoop.com/m/Jrb3G1tSCQK1=Re+Questions+about+Morphline+Solr+Sink+structure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2958) Add ignorePattern for TaildirSource

2016-07-21 Thread Hu Liu, (JIRA)
Hu Liu, created FLUME-2958:
--

 Summary: Add ignorePattern for TaildirSource
 Key: FLUME-2958
 URL: https://issues.apache.org/jira/browse/FLUME-2958
 Project: Flume
  Issue Type: Improvement
  Components: Sinks+Sources
Affects Versions: v1.7.0
Reporter: Hu Liu,


we have tried the TaildirSource and found that it's lack of ignorePattern 
specifying which files to ignore. I'm glad to work on it if anyone assign it to 
me



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50232: FLUME-2619: Spooldir source does not log channel exceptions

2016-07-21 Thread Alexander Alten-Lorenz

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50232/#review143058
---


Ship it!




- Alexander Alten-Lorenz


On July 20, 2016, 1:25 p.m., Balázs Donát Bessenyei wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50232/
> ---
> 
> (Updated July 20, 2016, 1:25 p.m.)
> 
> 
> Review request for Flume, Denes Arvay and Attila Simon.
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> Spooldir assumes that any ChannelException means that the channel is full and 
> it does not log the exception message.
> 
> 
> Diffs
> -
> 
>   flume-ng-core/src/main/java/org/apache/flume/channel/ChannelProcessor.java 
> 1cce137 
>   
> flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java 
> d88cc1d 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java
>  82c5351 
> 
> Diff: https://reviews.apache.org/r/50232/diff/
> 
> 
> Testing
> ---
> 
> [INFO] Flume NG Core .. SUCCESS [08:04 
> min]
> 
> 
> Thanks,
> 
> Balázs Donát Bessenyei
> 
>