Mike Percy has posted comments on this change.

Change subject: kudu flume sink blog post

Patch Set 4:


Thanks for the update, Ara! I've provided a bunch of feedback.

Also, if you don't mind, please remove the trailing spaces in the file (they 
are highlighted in red on Gerrit)

File _posts/2016-07-06-flume.md:

Line 25: So, in a nutshell, _batch processing_ is:
s/So, in a nutshell/To summarize/

Line 27: - primitive
I don't really understand what you mean by primitive here. I would consider 
removing this dimension of the comparison. An example is Spark, which is batch 
oriented but quite rich in its APIs and capabilities, I think.

Line 29: - batch-oriented
How about: s/batch-oriented/a paradigm that processes large chunks of data as a 

Line 30: - fast ingest, slow query 
high latency and high throughput, both for ingest and query

Line 31: - easy to program, but hard to orchestrate
typically easy to program

Line 32: - easy to write ad-hoc, though slow, queries
well suited for writing ad-hoc queries, although they are typically high latency

Line 36: - not primitive, has rich constructs such as time windows
s/not primitive, has rich constructs such as time windows/a totally different 
paradigm, which involves single events and time windows instead of large groups 
of events/

Line 37: - still file-based and not a long-term database
typically still file-oriented, instead of table-oriented

Line 39: - ultra-fast ingest and ultra-fast query (query results basically 
very low latency and often low throughput both for ingest and query (query 
results are typically pre-calculated at ingest time)

Line 40: - not so easy to program, relatively easy to orchestrate
often difficult to program for

Line 45: - not primitive, thanks to SQL support via Impala
flexible and expressive, thanks to SQL support via Apache Impala (incubating)

Line 46: - a real long-term database, with SQL support via Impala
a table-oriented, mutable data store that feels like a traditional relational 

Line 47: - neither batch nor streaming
I'm not sure what you mean by neither batch or streaming. maybe just remove 

Line 48: - fast ingest and fast query
low-latency and relatively high throughput, both for ingest and query

Line 83: can see, nowhere Hadoop is mentioned but Flume is typically used for 
ingesting data to Hadoop
s/nowhere Hadoop is mentioned/nowhere is Hadoop mentioned/

PS4, Line 88: _agent_
add a comma after _agent_

PS4, Line 129: vmstat

PS4, Line 132: SimpleKuduEventProducer

PS4, Line 135: KuduSink

PS4, Line 136: from Kudu distributio
from the Kudu distribution

Line 137: `$FLUME_HOME/plugins.d/kudu-sink/lib` in the Flume installation. The 
jar file contains KuduSink
missing word: `$FLUME_HOME/plugins.d/kudu-sink/lib` directory

PS4, Line 137: KuduSink

Line 138: and all the dependencies of it (including Kudu java client classes).
s/and all the dependencies of it/and all of its dependencies/

PS4, Line 142: Kudu Flume Sink
The Kudu Flume Sink

PS4, Line 143: it runs
before the Kudu Flume Sink is started.

PS4, Line 205: SimpleKuduEventProducer

PS4, Line 236: from 
from the

PS4, Line 238: SimpleKuduEventProducer

Line 248: on the built-in ones.
Add: In the future, we plan to add more flexible event producer implementations 
so that creation of a custom event producer is not required to write data to 
Kudu. See here for a work-in-progress generic event producer for Avro-encoded 
Events: https://gerrit.cloudera.org/#/c/4034/

Line 252: Kudu is a scalable database which lets us ingest insane amounts of 
data per second. Apache Flume
s/database/data store/

PS4, Line 253: library

Line 256
At the end, if you want to, consider adding a bio paragraph about yourself in 
italics. Maybe something like this?

Ara Abrahamian is a software engineer at Argyle Data building X. <insert 
whatever else you want about yourself>. Ara is the original author of the Flume 
Kudu Sink that is included in the Kudu distribution.

To view, visit http://gerrit.cloudera.org:8080/3510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I810146ab24c88bc6cc562d81746b9bf5303396ed
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Ara Ebrahimi <ara.ebrah...@argyledata.com>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-HasComments: Yes

Reply via email to