[GitHub] flume pull request #250: FLUME-3146 Use public API HdfsDataOutputStream#getC...

2018-11-28 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/250

FLUME-3146 Use public API HdfsDataOutputStream#getCurrentBlockReplica…

…tion where applicable

Took over this issue from Wei-Chiu Chuang. Added a few lines to the tests.
All tests pass, the new feature works.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-3146

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #250


commit c41c97dd7d8b4d92ba99e27d404dc2ddc1b3e7ee
Author: Endre Major 
Date:   2018-11-27T10:14:06Z

FLUME-3146 Use public API HdfsDataOutputStream#getCurrentBlockReplication 
where applicable




---


[GitHub] flume pull request #246: FLUME-2723 batch size trans cap doc update

2018-11-23 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/246

FLUME-2723 batch size trans cap doc update

An update to the configuration section of the user guide.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-2723

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #246


commit c28d2d53dca28446ab03134b5bc5363b2e15e08e
Author: Endre Major 
Date:   2018-11-23T09:12:49Z

FLUME-2723 batch size trans cap doc update




---


[GitHub] flume pull request #244: FLUME-2989 added 2 KafkaChannel metrics

2018-11-22 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/244

FLUME-2989 added 2 KafkaChannel metrics

KafkaChannel was missing some metrics:  eventTakeAttemptCount, 
eventPutAttemptCount

This PR is based on the patch included in the issue that was the work of 
Umesh Chaudhary.
I reworked the test a bit to use Mockito, and made some other minor 
modifications to the test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-2989

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #244


commit 8de6c94089ddbfbd88e2d13a47c91fa0800bc7d6
Author: Endre Major 
Date:   2018-11-22T22:09:38Z

FLUME-2989 added KafkaChannel metrics eventTakeAttemptCount, 
eventPutAttemptCount




---


[GitHub] flume pull request #243: FLUME-3243 hdfs.callTimeout deafault increased and ...

2018-11-22 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/243

FLUME-3243 hdfs.callTimeout deafault increased and deprecated

The default hdfs.callTimeout used by the HDFS sink was too low only 10 
seconds that can cause problems on a busy system.
The new default is 30 sec.
I think this parameter should be deprecated and some new more error 
tolerant solution should be used. To enable the future change I indicated this 
in the code and in the Users Guide.
Tested only with the unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-3243

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/243.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #243


commit 24c9e5f781fd7ca53c061f1ce5f9a6a555bf95c3
Author: Endre Major 
Date:   2018-11-22T20:19:41Z

FLUME-3243 hdfs.callTimeout deafault increased and deprecated




---


[GitHub] flume pull request #242: FLUME-1342 adding jmx metrics tables to docs

2018-11-22 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/242

FLUME-1342 adding jmx metrics tables to docs

This PR adds a few tables to the User Guide that describe the metrics 
published by sorurces, sinks and channels.
I used simple unix tools to gather the data then I wrote a small utility to 
convert it to csv.
Then I used an online converter https://www.tablesgenerator.com/ to 
generate the rst tables and then a little manual editing.
I discovered some rst formatting problems in the FlumeUserGuide.rst, 
corrected them, too.
It was rather painful process to gather the data and find a decent 
representation.
So far this PR only contains the end result. I would be happy to share the 
utilities, just don't know what would be the best way.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-1342

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/242.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #242


commit e2bbbc85dd3d07425322f2a335bea4918f071b44
Author: Endre Major 
Date:   2018-11-22T18:52:45Z

FLUME-1342 adding jmx metrics tables to docs




---


[GitHub] flume pull request #237: FLUME-2653 Allow hdfs sink inUseSuffix to be empty

2018-11-15 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/237

FLUME-2653 Allow hdfs sink inUseSuffix to be empty

This is based on the contributions for FLUME-2653 regarding a new feature 
for the hdfs sink.
Added a new parameter hdfs.emptyInUseSuffix to allow the output file name 
to remain unchanged.
See the user guide changes for details.
This is desired feature from the community. 

I added a new junit test case for testing.
Temporarily modified old test cases in my ide to use the new flag, and they 
passed. I did this just as one of test, to be on the safe side. It is not in 
this PR.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-2653

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #237


commit 930476595e70b2ecb5fd3a21a732b82391d351f8
Author: Endre Major 
Date:   2018-11-14T17:44:02Z

FLUME-2653 Allow inUseSuffix to be empty




---


[GitHub] flume pull request #235: Flume 3281 Update to Kafka 2.0

2018-11-12 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/235

Flume 3281 Update to Kafka 2.0

This has been tested with unit tests. The main difference that caused the 
most problems is the consumer.poll(Duration) change. This does not block even 
when it fetches meta data whereas the previous poll(long timeout) blocked 
indefinitely for meta data fetching.
This has resulted in many test timing issues. I tried to do minimal changes 
at the tests, just enough to make them pass.

Kafka 2.0 requires a higher version for slf4j, I had to update it to 1.7.25.

Option migrateZookeeperOffsets is deprecated in this PR. This will allow us 
to get rid of Kafka server libraries in Flume.

Compatibility testing. 
Modified the TestUtil to be able to use external servers. This I could test 
against a variety of Kafka Server versions using the normal unit tests. 
Channel tests using 2.0.1 client:
Kafka_2.11-0.11.0.3 - timeouts in TestPartitions when creating topics
Kafka_2.11-1.0.2 - passed
Kafka_2.11-1.1.1 - passed
Kafka_2.11-2.0.1 - passed 

I will publish further results here, today or tomorrow.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-3281

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #235


commit 2b6818ad2f8c9d8367ba5526f800583f29967464
Author: Endre Major 
Date:   2018-11-08T14:36:47Z

FLUME-3281 Update to Kafka 2.0

commit e1a98bf98bdfa2fc3f524e94ed5e5603477f7820
Author: Endre Major 
Date:   2018-11-10T16:14:06Z

FLUME-3281 TestUtil improvements external servers

commit 34ce07c9ba3cc388630a6fd2e5cb14f94b665ca5
Author: Endre Major 
Date:   2018-11-12T13:31:34Z

FLUME-3281 Deprecating migrateZookeeperOffsets




---


[GitHub] flume pull request #229: FLUME-3223 Flume HDFS Sink should retry close prior...

2018-09-27 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/229

FLUME-3223 Flume HDFS Sink should retry close prior release lease

This is based on @mcsanady 's original pull request #202 
I have took the test changes from him but reworked the new feature 
implementation since it failed some unit tests.
Previously when a close failed we immediately did a recover lease. This PR 
introduces a background retry mechanism. It uses the already existing 
"hdfs.closeTries" parameter. Unfortunately it has infinite retries by default, 
that seems a bit too long for me.
I also did a minimal code clean up.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-3223

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/229.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #229


commit 5514f0489ae6091bec5eea814a4c8a9990eede35
Author: Endre Major 
Date:   2018-09-27T12:05:38Z

FLUME-3223 Flume HDFS Sink should retry close prior release lease




---


[GitHub] flume pull request #226: FLUME-2973 BucketWriter deadlock fix

2018-08-28 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/226

FLUME-2973 BucketWriter deadlock fix

This PR is based on Yan Jian's fix and his test improvements. Also contains 
the deadlock reproduction contributed by @adenes.
I have made minimal changes to those contributions.
Denes's test was used for checking the fix.
Yan's fix contains an optimization as it first calls the callback function 
that removes the BucketWriter from the cache. This is useful, should help to 
avoid some errors.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-2973

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/226.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #226






---


[GitHub] flume pull request #222: FLUME-3050 add counters for error conditions and ex...

2018-08-06 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/222

FLUME-3050 add counters for error conditions and expose to monitor URL

Concept: an error is when an Exception is thrown or an ERROR level log is 
written during event processing.
In case of an error at least 1 error counter is increased at least once. 
(Preferably 1 counter once).
Errors during event processing are counted. Initialization errors are not 
handled here.
3 types of errors are differentiated.
-Channel read/write errors from the channel when the channel throws a 
ChannelException.
-Event read/write errors. E.g: A source cannot read an event due to 
-Generic errors - e.g.: TaildirSource cannot write position file.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-3050

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #222


commit c82d23011aa5dcc47df997f47792e8ececf88303
Author: emajor 
Date:   2018-07-20T15:38:34Z

FLUME-3050 WIP

commit 8245d210f186fef06a3f7d996116f7c02e66f552
Author: emajor 
Date:   2018-07-24T12:43:24Z

WIP

commit 83ae524a37acfcdd2442128fc19b26cdf30f1b45
Author: emajor 
Date:   2018-07-30T10:03:39Z

WIP tests

commit eecd494a6b0c7e2000398429520014a143f8ea30
Author: emajor 
Date:   2018-07-30T11:50:30Z

clean up 1

commit b4c9afabd4621d5f68a403644c75bc2c3f211be4
Author: emajor 
Date:   2018-07-30T16:12:24Z

clean up 2

commit cc1d88abc31c5ae81cc16842d5d14418e5176b8b
Author: emajor 
Date:   2018-07-30T16:17:59Z

clean up 3

commit 37594abeb2fbd2d695d3585d0351d7295810b5c4
Author: emajor 
Date:   2018-07-31T14:57:39Z

WIP adding further tests

commit bc6e4fc18ecfabd0e2a8c9f7911573ee50ce60e7
Author: emajor 
Date:   2018-08-01T16:40:31Z

further tests

commit d200eda3195f84b89580aabd5bdac19a9c8c0f8e
Author: emajor 
Date:   2018-08-06T09:45:47Z

morphline error counter added

commit dd851dda8d3d95c1a37563a9012e153c79a17b37
Author: emajor 
Date:   2018-08-06T13:51:15Z

cleanup and test fix

commit 63dff5781adeaab7d8aea74a45e0e9b33e2be06b
Author: emajor 
Date:   2018-08-06T15:23:11Z

Adding error counters to ScribeSource




---


[GitHub] flume pull request #214: FLUME-3239 Do not rename files in SpoolDirectorySou...

2018-07-03 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/214

FLUME-3239 Do not rename files in SpoolDirectorySource

Added functionality to track files in the meta directory rather than 
renaming them.
Improved tests for checking multilevel directories.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-3239

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #214


commit 878088fd970cfcaff9ed0a1ce656870f22348532
Author: Endre Major 
Date:   2018-06-05T12:35:40Z

FLUME-3239 Do not rename files in SpoolDirectorySource




---


[GitHub] flume pull request #212: WIP FLUME-3246 Validate flume configuration to prev...

2018-06-19 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/212

WIP FLUME-3246 Validate flume configuration to prevent larger source batc…

…h size than the channel transaction capacity

The loadSources() method seemed like an appropriate place to check this.
Added 2 new interfaces for getting the transaction capacity and the batch 
size fields. The check is only done for channels that implement the 
TransactioCapacitySupported interface and sources that implement the 
BatchSizeSupported  interface.
There is a new unit test case that I used for testing.

TODOs:
Add the BatchSizeSupported interface to all the sources that handle batch 
size.
Check how this works when reloading configuration.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-3246

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #212


commit b548e41f4299e45a3b9e1f74c080203c4c301774
Author: emajor 
Date:   2018-06-19T12:54:50Z

FLUME-3246 Validate flume configuration to prevent larger source batch size 
than the channel transaction capacity




---


[GitHub] flume pull request #211: FLUME-3239 Do not rename files in SpoolDirectorySou...

2018-06-07 Thread majorendre
Github user majorendre closed the pull request at:

https://github.com/apache/flume/pull/211


---


[GitHub] flume pull request #211: FLUME-3239 Do not rename files in SpoolDirectorySou...

2018-06-05 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/211

FLUME-3239 Do not rename files in SpoolDirectorySource WIP

Work in progress.
Added functionality to track files in the meta directory rather than 
renaming them.

This is an early preview to check if the direction is right.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-3239

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #211


commit c0695ad01f172248341eec849ca2b4d0848819b1
Author: Endre Major 
Date:   2018-06-05T12:35:40Z

FLUME-3239 Do not rename files in SpoolDirectorySource




---


[GitHub] flume pull request #208: FLUME-3222 Fix for NoSuchFileException thrown when ...

2018-05-30 Thread majorendre
GitHub user majorendre opened a pull request:

https://github.com/apache/flume/pull/208

FLUME-3222 Fix for NoSuchFileException thrown when files are being de…

…leted from the TAILDIR source

We fetch file names from a directory and later we fetch inodes.
If there is a delete between these operations this problem occurs.
Reproduced from unit test.
Added exception handling to handle this case. 
It is enough to ignore the NoSuchFileException and continue.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majorendre/flume FLUME-3222

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #208


commit c291b621514f5aa1e0f9fcdc5ba897c66d4ce43f
Author: Endre Major 
Date:   2018-05-29T14:31:27Z

FLUME-3222 Fix for NoSuchFileException thrown when files are being deleted 
from the TAILDIR source




---