Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-30 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163545
---



Thanks Hai for the docs! Committed to master.

- Jagadish Venkatraman


On Jan. 28, 2017, 6:15 a.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 28, 2017, 6:15 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Hai Lu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
---

(Updated Jan. 28, 2017, 6:15 a.m.)


Review request for samza.


Bugs: SAMZA-1025
https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description
---

documentation for hdfs system consumer


Diffs (updated)
-

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md 
b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html 
d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
---

N/A


Thanks,

Hai Lu



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163355
---


Fix it, then Ship it!




Some nits and comments. Otherwise, looks good. Thanks! +1


docs/learn/documentation/versioned/hdfs/consumer.md (line 39)


This line is confusing. Are you implying that I can read from non-avro 
formatted files that are in HDFS ? 
What is the significance of the SingleFileHdfsReader interface ? It is not 
clear to the reader.



docs/learn/documentation/versioned/hdfs/consumer.md (line 89)


Nit: Can you move the explanation of what advanced partitioning is  outside 
of the code block? 
You can emphasize the reserved term note by doing -> 
**note**  , when it is outside the code block



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1822)


Look like a typo. It should "systems.*, instead of "system.*" ?


- Navina Ramesh


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Navina Ramesh


> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 67
> > 
> >
> > The relationship between whitelist and blacklist was not very obvious 
> > to me.
> > 
> > Is the behavior that the whitelist is applied first, and the blacklist 
> > is applied to the matched files later? (to determine which files are to be 
> > ignored).
> 
> Hai Lu wrote:
> The order doesn't matter. (X & whitelist) - blacklist == (X - blacklist) 
> & whitelist

This is assuming that whitelist and blacklist are mutually exclusive, right?


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
---


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Navina Ramesh


> On Jan. 25, 2017, 10:50 p.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 26
> > 
> >
> > Can you include the diagram from your design document?  Or something 
> > similar to elaborate how the setup should look like?
> 
> Hai Lu wrote:
> The diagram was mostly for the situation at LinkedIn where we have 
> separte yarn clusters - one for Samza, one for Hadoop. "Your job needs to run 
> on the same YARN cluster which hosts the HDFS you want to consume from."  Is 
> this statement not clear enough? What suggestion do you have in terms of the 
> wording?

My bad. I thought it was generic architecture diagram. Didn't realize it was 
specific to LinkedIn's deployment. Please ignore this comment.


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163036
---


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review16
---


Ship it!




Thanks!


docs/learn/documentation/versioned/hdfs/consumer.md (line 78)


I meant use a dummy name. not the `dummyname` as string. Re-name it to 
`your-principal-name`



docs/learn/documentation/versioned/hdfs/consumer.md (line 89)


nit: the way you want/arbitrarily


- Jagadish Venkatraman


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Hai Lu


> On Jan. 27, 2017, 7:03 a.m., Jagadish Venkatraman wrote:
> > This is looking pretty good. Thank you for the effort in writing the docs!

Thank you for taking the time to review it. Really appreciate it:)


> On Jan. 27, 2017, 7:03 a.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 30
> > 
> >
> > `to process these partitions` appears twice? Copy-paste error?

Oops, sorry for such a reckless mistake.


- Hai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163234
---


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Hai Lu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
---

(Updated Jan. 27, 2017, 5:48 p.m.)


Review request for samza.


Bugs: SAMZA-1025
https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description
---

documentation for hdfs system consumer


Diffs (updated)
-

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md 
b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html 
d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
---

N/A


Thanks,

Hai Lu



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-26 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163234
---



This is looking pretty good. Thank you for the effort in writing the docs!


docs/learn/documentation/versioned/hdfs/consumer.md (line 30)


`to process these partitions` appears twice? Copy-paste error?



docs/learn/documentation/versioned/hdfs/consumer.md (line 31)


s/one single/a single


- Jagadish Venkatraman


On Jan. 27, 2017, 12:03 a.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 12:03 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-26 Thread Hai Lu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
---

(Updated Jan. 27, 2017, 12:03 a.m.)


Review request for samza.


Bugs: SAMZA-1025
https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description
---

documentation for hdfs system consumer


Diffs (updated)
-

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md 
b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html 
d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
---

N/A


Thanks,

Hai Lu



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-26 Thread Hai Lu


> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 92
> > 
> >
> > nit: Use capitalizations consistently
> > 
> > 1. Is `id` of any significance here?

Yes. it has to be exactly "[id]"


> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1832
> > 
> >
> > Do we allow any other `partitioner` other than the defaultPartitioner? 
> > If not, it feels slightly unconventional to have `default` in the property 
> > name?

This was discussed in the code review for implementation. And yes, we do allow 
other type of partitioner potentially in the future. That's why I was suggested 
to put "defaultPartitioner" here.


> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1839
> > 
> >
> > Is the group identifier the partition name? 
> > Is `id` a reserved term? 
> > Do we always expect `[id]`?

yes, it's a reserved term. updated the doc.


> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1847
> > 
> >
> > The other minor observation is that the camel case naming scheme is 
> > slightly inconsistent with what we have for our public configs.
> > 
> > For instance, job.coordinator.
> > monitor-partition-change follows a different convention. I'd really 
> > prefer consistency in our config scheme. 
> > 
> > Navina or Prateek thoughts?

I'm not sure what kind of discussion was going on there. But when I implemented 
the codes, I was explicitly told to adopt such a style (camel case).


> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1842
> > 
> >
> > Is this the name of the class? I'm still not clear how this is used?

It's literally "avro", or "plain", or "json". Though the last two are not 
supported now. No, they are not class name.


- Hai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163177
---


On Jan. 26, 2017, 6:47 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 26, 2017, 6:47 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-26 Thread Jagadish Venkatraman


> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 97
> > 
> >
> > Not clear to me how this differs from the whitelist (*.avro which 
> > specifies what files to process).
> 
> Hai Lu wrote:
> They completely different: reader is the type of data encoded in the file 
> content; whitelist is used to filter based on file name. Technically you can 
> have an avro file that has ".java" as it's extention, right?

But, as a user of the API, it was not obvious to me how the value of `avro` is 
used. (or what it's significance is)

Does this have to be tied with the serde?


- Jagadish


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
---


On Jan. 26, 2017, 6:47 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 26, 2017, 6:47 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-26 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163177
---




docs/learn/documentation/versioned/hdfs/consumer.md (line 22)


Re-word to be concise.

Avro encoded records are supported out of the box and it is easy to extend 
to support other formats (plain text, csv, json etc).
`in the future` is probably not needed.



docs/learn/documentation/versioned/hdfs/consumer.md (line 30)


nit: How do new lines look when rendered in the web-page? Do they appear 
like paragraphs? If so, do you think this can be made a single paragraph? 
(instead of 3).



docs/learn/documentation/versioned/hdfs/consumer.md (line 60)


typo here task.inputs



docs/learn/documentation/versioned/hdfs/consumer.md (line 68)


Would be good to provide dummy values? For instance, the above property 
(`whitelist`) uses it consistently.



docs/learn/documentation/versioned/hdfs/consumer.md (line 76)


Redundant with previous line. Can potentially abridge?



docs/learn/documentation/versioned/hdfs/consumer.md (line 77)


contains both a '.' and a ':'. Not sure if both are required.



docs/learn/documentation/versioned/hdfs/consumer.md (line 81)


Use newlines after defining properties consistently.



docs/learn/documentation/versioned/hdfs/consumer.md (line 83)


Can provide a dummy value here consistently?



docs/learn/documentation/versioned/hdfs/consumer.md (line 89)


Use periods consistently. For example, the above section ended with a 
period.



docs/learn/documentation/versioned/hdfs/consumer.md (line 92)


nit: Use capitalizations consistently

1. Is `id` of any significance here?



docs/learn/documentation/versioned/hdfs/consumer.md (line 93)


Please provide a value.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1819)


Thanks for calling out the trade-offs in the docs.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1832)


Do we allow any other `partitioner` other than the defaultPartitioner? If 
not, it feels slightly unconventional to have `default` in the property name?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1839)


Is the group identifier the partition name? 
Is `id` a reserved term? 
Do we always expect `[id]`?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1840)


Thanks for adding the example.

Re-word docs.

That you want to organize into `three` partitions as ...



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1842)


Is this the name of the class? I'm still not clear how this is used?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1847)


The other minor observation is that the camel case naming scheme is 
slightly inconsistent with what we have for our public configs.

For instance, job.coordinator.
monitor-partition-change follows a different convention. I'd really prefer 
consistency in our config scheme. 

Navina or Prateek thoughts?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1849)


typo: be default ? Did you mean By default?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1850)


Typo: Did you want to use two periods `..`?


- Jagadish Venkatraman


On Jan. 26, 2017, 6:47 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 26, 2017, 6:47 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> 

Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-26 Thread Hai Lu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
---

(Updated Jan. 26, 2017, 6:47 p.m.)


Review request for samza.


Bugs: SAMZA-1025
https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description
---

documentation for hdfs system consumer


Diffs (updated)
-

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md 
b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html 
d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
---

N/A


Thanks,

Hai Lu



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-26 Thread Hai Lu


> On Jan. 25, 2017, 10:50 p.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 26
> > 
> >
> > Can you include the diagram from your design document?  Or something 
> > similar to elaborate how the setup should look like?

The diagram was mostly for the situation at LinkedIn where we have separte yarn 
clusters - one for Samza, one for Hadoop. "Your job needs to run on the same 
YARN cluster which hosts the HDFS you want to consume from."  Is this statement 
not clear enough? What suggestion do you have in terms of the wording?


> On Jan. 25, 2017, 10:50 p.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 42
> > 
> >
> > Can you add a snippet of the interface here as well? It is easier for 
> > the user to skim through.

Linked to the java doc.


> On Jan. 25, 2017, 10:50 p.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 50
> > 
> >
> > Replace "users" with "user application". 
> > 
> > We provide the capability for the user application to get notified when 
> > ...
> > 
> > Rephrase "To do so, simply implement the interface 
> > [EndOfStreamListenerTask]" as "In order to receieve notications on 
> > EndOfStream with the task, the user application should simply implement the 
> > interface ..."

I changed the wording as Jagadish suggested above. Let me know if you have 
further suggestion on top of that.


- Hai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163036
---


On Jan. 24, 2017, 2:07 a.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 24, 2017, 2:07 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-26 Thread Hai Lu


> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 67
> > 
> >
> > The relationship between whitelist and blacklist was not very obvious 
> > to me.
> > 
> > Is the behavior that the whitelist is applied first, and the blacklist 
> > is applied to the matched files later? (to determine which files are to be 
> > ignored).

The order doesn't matter. (X & whitelist) - blacklist == (X - blacklist) & 
whitelist


> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 97
> > 
> >
> > Not clear to me how this differs from the whitelist (*.avro which 
> > specifies what files to process).

They completely different: reader is the type of data encoded in the file 
content; whitelist is used to filter based on file name. Technically you can 
have an avro file that has ".java" as it's extention, right?


> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1819
> > 
> >
> > What's this configuration? Is this the number of messages? What are the 
> > implications of this? 
> > 
> > I'm not in favor of exposing this tunable if this is not 
> > super-significant.

This is important for performance tuning in some cases. Added a bit more 
details to explain


- Hai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
---


On Jan. 24, 2017, 2:07 a.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 24, 2017, 2:07 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-25 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163036
---



Thanks for adding the documentation!


docs/learn/documentation/versioned/hdfs/consumer.md (line 26)


Can you include the diagram from your design document?  Or something 
similar to elaborate how the setup should look like?



docs/learn/documentation/versioned/hdfs/consumer.md (line 42)


Can you add a snippet of the interface here as well? It is easier for the 
user to skim through.



docs/learn/documentation/versioned/hdfs/consumer.md (line 50)


Replace "users" with "user application". 

We provide the capability for the user application to get notified when ...

Rephrase "To do so, simply implement the interface 
[EndOfStreamListenerTask]" as "In order to receieve notications on EndOfStream 
with the task, the user application should simply implement the interface ..."



docs/learn/documentation/versioned/hdfs/consumer.md (line 54)


I think you can skip the "job.properties" file. The readers may easily 
assume there is a separate properties file.



docs/learn/documentation/versioned/hdfs/consumer.md (line 75)


Typo "configs are required"



docs/learn/documentation/versioned/hdfs/consumer.md (line 77)


I don't think you need to mentioned the JIRA that introduced a feature. If 
there is documentation related to security in Samza, you can link to it. You 
can link to the javadoc for SamzaContainerSecurityManager.

You can add a brief description of the feature. For example, the 
SamzaContainer fetches and renews the Kerberos delegation tokens when the job 
is running in a secure environment. User application needs to specify ..



docs/learn/documentation/versioned/hdfs/consumer.md (line 93)


Can you elaborate the "advanced partitioning" feature here  and remove the 
link for design doc? If it helps, you can just copy-and-paste the design doc 
content here and edit it :)



docs/learn/documentation/versioned/hdfs/consumer.md (line 102)


Looks like there are more configurations that are mentioned in the 
configuration table. Can you please add the link to configuration table to 
imply that?



docs/learn/documentation/versioned/hdfs/consumer.md (line 105)


You can add a link to the design doc pdf here , instead of the JIRA link.


- Navina Ramesh


On Jan. 24, 2017, 2:07 a.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 24, 2017, 2:07 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-25 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
---




docs/learn/documentation/versioned/hdfs/consumer.md (line 22)


Can re-word docs for clarity.
1. Remove `Now` here: s/Now// - "You can configure your Samza jobs to read 
data from HDFS files". 
2. No need to mention that it's implemented in `samza-hdfs`.
3. Substitute: "Implemented in `samza-hdfs`, the `HdfsSystemConsumer` will 
read from HDFS files (`avro` records only for now. /->> "Only avro encoded 
records are supported for now."
4. "very easy to extend to plain text, csv, json, etc. in the future" -> 
How is this supported? A hint on how to do this is useful. If not, we can nuke 
this.
5. Reword: "Deliver messages to your tasks" can be left.



docs/learn/documentation/versioned/hdfs/consumer.md (line 26)


Thanks for calling this out explicitly.



docs/learn/documentation/versioned/hdfs/consumer.md (line 30)


Do you think it can be more concise:
- "The way `HDFSSystemConsumer` does partitioning is basically to treat 
each HDFS file as a partition. Using Kafka as an analogy, a HDFS directory is 
close to the concept of a Kafka topic, and all the files within the directory 
are like topic partitions in the sense of Kafka. "

Reword it to be: (something on the lines of.)
"Partitioning works at the level of individual HDFS files. Each file is 
treated as a stream partition."

- This line can be removed completely. The subsequent examples expands on 
the idea.



docs/learn/documentation/versioned/hdfs/consumer.md (line 34)


Reword:

You can configure upto 10 Samza containers to process these partitions.



docs/learn/documentation/versioned/hdfs/consumer.md (line 40)


1. Not sure we have a noun called `payload` anywhere else in the Samza docs.

2. Can be more precise in wording this.
The received IncomingMessageEnvelope contains three significant fields.
- The key which is empty.
- The message which is set to the avro GenericRecord.
- The stream partition which is set to the name of the HDFS file.

3. Please link the IncomingMessageEnvelope API.



docs/learn/documentation/versioned/hdfs/consumer.md (line 42)


1. Please link the javadocs of SingleFileHDFSReader
2. Suggest re-word:
2.1 one can implement/ you can implement (Since, the rest of the 
documentation is directed at the user)
2.2 The second line on "few methods need to be implemented" can be ignored.



docs/learn/documentation/versioned/hdfs/consumer.md (line 43)


It's not super-obvious how the implementation of the `SingleFileHDFSReader` 
ties to the underlying consumer. Am I missing something?



docs/learn/documentation/versioned/hdfs/consumer.md (line 46)


1. Not sure we want to introduce a new noun called `data stream` in the 
docs.

Suggest re-word to be precise:

While a kafka topic has an unbounded stream of messages, HDFS files are 
bounded and have a notion of EOF.



docs/learn/documentation/versioned/hdfs/consumer.md (line 48)


Reword docs to be concise:

You can choose to implement `EndOfStreamListenerTask` to receive a callback 
when all partitions are at end of stream. When all partitions being processed 
by the task are at end of stream (ie. EOF has been reached for all files), the 
Samza job exits automatically.

Remove `What happens when we finish reading all data from the HDFS files? 
The behavior is that the Samza job will shut itself down once the job is done 
with all files.`



docs/learn/documentation/versioned/hdfs/consumer.md (line 67)


The relationship between whitelist and blacklist was not very obvious to me.

Is the behavior that the whitelist is applied first, and the blacklist is 
applied to the matched files later? (to determine which files are to be 
ignored).



docs/learn/documentation/versioned/hdfs/consumer.md (line 75)


1.fix typo: required
2.For HDFS clusters that have kerberos enabled,...



docs/learn/documentation/versioned/hdfs/consumer.md (line 77)


Suggestions on docs:

1. Not needed to mention the jira.
2. Not needed to mention the class name.

The 

Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-23 Thread Hai Lu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
---

(Updated Jan. 24, 2017, 2:07 a.m.)


Review request for samza.


Bugs: SAMZA-1025
https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description (updated)
---

documentation for hdfs system consumer


Diffs (updated)
-

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md 
b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html 
d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
---

N/A


Thanks,

Hai Lu



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2016-11-11 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review155740
---




docs/learn/documentation/versioned/hdfs/consumer.md (line 22)


Please put more details about how to extend it to read other types of 
records, like which interface to extend. More tech details will be useful to 
other users.



docs/learn/documentation/versioned/hdfs/consumer.md (line 66)


This looks not that great to the users. Can we put the details here instead 
of just refering to a JIRA? Since this is user facing docs, please add the 
complete information.

Another note: the doc doesn't mention the job behavior when end of stream. 
That needs to be described here in detail too.



docs/learn/documentation/versioned/hdfs/producer.md (line 70)


Seems to me this is weird that the consumer page entrance is from the 
producer page. I would suggest add the link in versioned/index.html, something 
like "Reading from HDFS" before the link of "Writing to HDFS"


- Xinyu Liu


On Oct. 5, 2016, 8:54 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Oct. 5, 2016, 8:54 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation of hdfs consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> f60cd50fb197423ac3c84fd364bbe4fb3767883e 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2016-10-05 Thread Hai Lu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
---

Review request for samza.


Bugs: SAMZA-1025
https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description
---

documentation of hdfs consumer


Diffs
-

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md 
b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
f60cd50fb197423ac3c84fd364bbe4fb3767883e 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
---

N/A


Thanks,

Hai Lu