Misty Stanley-Jones has posted comments on this change.

Change subject: kudu flume sink blog post
......................................................................


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/3510/1/_posts/2016-07-06-flume.md
File _posts/2016-07-06-flume.md:

Line 6: In this article I will discuss the Kudu Flume Sink. But before doing 
that let me tell you why we considered Kudu to begin with, and what Flume does 
and how it fits in an architecture involving Kudu.
Consider rephrasing to something like:

This post discusses the Kudu Flume Sink. First, I'll give some background on 
why we considered using Kudu, what Flume does for us, and how Flume fits with 
Kudu in our project.


Line 9: ====
In blog posts, lists work better than very long paragraphs. Consider distilling 
this content into some bullet points.


Line 10: There are many different ways of looking at Kudu. One way is to look 
at it as a tool which can be used to build system which are closer to 
_real-time_ processing of big data but without using _streaming_ software.
nit long lines (and following)

Try to stay under 100 characters to make Gerrit reviews easier.


Line 12: Traditionally in the Hadoop ecosystem we've dealt with various _batch 
processing_ technologies such as Map/Reduce and the many libraries and tools 
built on top of it in various languages (Apache Pig, Apache Hive, Apache Oozie 
and many other things). The main problem with this approach is that it needs to 
process the whole data set in batches, again and again, as soon as new data 
gets added. Things get really complicated when a few such tasks need to get 
chained together, or when the same data set needs to be processed in various 
ways by different jobs, while all compete for the shared cluster resources. The 
whole _orchestration_ becomes a nightmare over time. The opposite of this 
approach is _stream processing_: process the data as soon as it arrives, not in 
batches. Streaming systems such as Spark Streaming, Storm, Kafka Streams, and 
many others make that possible. But writing streaming services is not trivial. 
The streaming systems are becoming more and more capab!
 le and support more complex constructs, but sometimes you just long for a good 
old database which you can simply store data inside and then query and write 
business logic on top. No slowness and primitiveness of batch processing, and 
no complexity of streaming. Something in between. That's what Kudu is, from 
this point of view. It's a scalable database, you can store big amounts of data 
in it with very impressive ingestion rates, enrich, delete and update that 
data, and generate views and reports. You can pretend it's a good old SQL 
database, but with scalability built in. The ability to use a real database 
instead of a bunch of files is quite empowering and leads to reduced 
complexity. Let's be honest, a bunch of files is not a database, and databases 
are popular for a reason: they enable us to write business logic on top of them 
with ease.
Generally, MapReduce doesn't need the / anymore.

Consider a simple diagram or flow chart to augment what you are saying here. I 
get it, but it is a little hard to follow just in one long paragraph.


Line 13:           
nit whitespace (and following)


Line 18: According to their website "Flume is a distributed, reliable, and 
available service for efficiently collecting, aggregating, and moving large 
amounts of log data. It has a simple and flexible architecture based on 
streaming data flows. It is robust and fault tolerant with tunable reliability 
mechanisms and many failover and recovery mechanisms." As you can see nowhere 
Hadoop is mentioned but Flume is typically used for ingesting data to Hadoop 
clusters. 
Maybe just have the quote and link to where you found it


Line 31: 
Do you need to use the back-ticks to turn this into a codeblock or is the 
indentation sufficient?


Line 59: Parameter Name      | Default                                       | 
Description
Consider formatting this as a real table or putting it into a codeblock so that 
it scrolls horizontally. Markdown lets you use pipes for tables, or you can 
code a HTML table within Markdown.


Line 70:     public class SimpleKuduEventProducer implements KuduEventProducer {
Again, make sure this ends up in a codeblock.


Line 139: Conclusion
Do you want to put some example code into Git somewhere so people  can check it 
out and play with it? That might be a nice thing to do.


-- 
To view, visit http://gerrit.cloudera.org:8080/3510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I810146ab24c88bc6cc562d81746b9bf5303396ed
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Ara Ebrahimi <ara.ebrah...@argyledata.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-HasComments: Yes

Reply via email to