Mike Percy has posted comments on this change.

Change subject: kudu flume sink blog post
......................................................................


Patch Set 4:

(32 comments)

Thanks for the update, Ara! I've provided a bunch of feedback.

Also, if you don't mind, please remove the trailing spaces in the file (they 
are highlighted in red on Gerrit)

http://gerrit.cloudera.org:8080/#/c/3510/4/_posts/2016-07-06-flume.md
File _posts/2016-07-06-flume.md:

Line 25: So, in a nutshell, _batch processing_ is:
s/So, in a nutshell/To summarize/


Line 27: - primitive
I don't really understand what you mean by primitive here. I would consider 
removing this dimension of the comparison. An example is Spark, which is batch 
oriented but quite rich in its APIs and capabilities, I think.


Line 29: - batch-oriented
How about: s/batch-oriented/a paradigm that processes large chunks of data as a 
group/


Line 30: - fast ingest, slow query 
high latency and high throughput, both for ingest and query


Line 31: - easy to program, but hard to orchestrate
typically easy to program


Line 32: - easy to write ad-hoc, though slow, queries
well suited for writing ad-hoc queries, although they are typically high latency


Line 36: - not primitive, has rich constructs such as time windows
s/not primitive, has rich constructs such as time windows/a totally different 
paradigm, which involves single events and time windows instead of large groups 
of events/


Line 37: - still file-based and not a long-term database
typically still file-oriented, instead of table-oriented


Line 39: - ultra-fast ingest and ultra-fast query (query results basically 
pre-calculated)
very low latency and often low throughput both for ingest and query (query 
results are typically pre-calculated at ingest time)


Line 40: - not so easy to program, relatively easy to orchestrate
often difficult to program for


Line 45: - not primitive, thanks to SQL support via Impala
flexible and expressive, thanks to SQL support via Apache Impala (incubating)


Line 46: - a real long-term database, with SQL support via Impala
a table-oriented, mutable data store that feels like a traditional relational 
database


Line 47: - neither batch nor streaming
I'm not sure what you mean by neither batch or streaming. maybe just remove 
this?


Line 48: - fast ingest and fast query
low-latency and relatively high throughput, both for ingest and query


Line 83: can see, nowhere Hadoop is mentioned but Flume is typically used for 
ingesting data to Hadoop
s/nowhere Hadoop is mentioned/nowhere is Hadoop mentioned/


PS4, Line 88: _agent_
add a comma after _agent_


PS4, Line 129: vmstat
`vmstat`


PS4, Line 132: SimpleKuduEventProducer
`SimpleKuduEventProducer`


PS4, Line 135: KuduSink
`KuduSink`


PS4, Line 136: from Kudu distributio
from the Kudu distribution


Line 137: `$FLUME_HOME/plugins.d/kudu-sink/lib` in the Flume installation. The 
jar file contains KuduSink
missing word: `$FLUME_HOME/plugins.d/kudu-sink/lib` directory


PS4, Line 137: KuduSink
`KuduSink`


Line 138: and all the dependencies of it (including Kudu java client classes).
s/and all the dependencies of it/and all of its dependencies/


PS4, Line 142: Kudu Flume Sink
The Kudu Flume Sink


PS4, Line 143: it runs
before the Kudu Flume Sink is started.


PS4, Line 205: SimpleKuduEventProducer
`SimpleKuduEventProducer`


PS4, Line 236: from 
from the


PS4, Line 238: SimpleKuduEventProducer
`SimpleKuduEventProducer`


Line 248: on the built-in ones.
Add: In the future, we plan to add more flexible event producer implementations 
so that creation of a custom event producer is not required to write data to 
Kudu. See here for a work-in-progress generic event producer for Avro-encoded 
Events: https://gerrit.cloudera.org/#/c/4034/


Line 252: Kudu is a scalable database which lets us ingest insane amounts of 
data per second. Apache Flume
s/database/data store/


PS4, Line 253: library
s/library//


Line 256
At the end, if you want to, consider adding a bio paragraph about yourself in 
italics. Maybe something like this?

Ara Abrahamian is a software engineer at Argyle Data building X. <insert 
whatever else you want about yourself>. Ara is the original author of the Flume 
Kudu Sink that is included in the Kudu distribution.


-- 
To view, visit http://gerrit.cloudera.org:8080/3510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I810146ab24c88bc6cc562d81746b9bf5303396ed
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Ara Ebrahimi <ara.ebrah...@argyledata.com>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-HasComments: Yes

Reply via email to