[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
Thanks for explanation. I guess there would be a big doc change soon? Will 
check those changes too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-20 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19485
  
The reference manual and API docs are different. Below is a link of DB2 LUW:
http://www-01.ibm.com/support/docview.wss?uid=swg27038855


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
@gatorsmile, sure, detailed doc is great and defintely I support it.

Just one thing I am worried of is duplication. If we add or change option, 
we have to update those  together and .. you know it.

Wouldn't it be nicer if we simply leave a pointer and remove the 
duplication if possible? If I understood correctly, the options would also be 
described in more details in the future in the new chapter and I think simpliy 
redirecting it might be feasible.

I guess it shouldn't be too difficult to make a sub-chapter for options 
only, for example, like 
http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options

Otherwise, would you maybe thimk there should be dfferent contents for a 
different purpose, or want to leave the duplication just for now as something 
to be fixed soon? If so, I am okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-20 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19485
  
Sure, I'll be working on this for this weekend. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19485
  
This is the API link you refer 
`https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame`
 

I just quickly scanned them. The option descriptions are pretty rough. They 
are made for advanced dev who the read API docs and play with them. In the long 
term, we should follow what the mainstream RDBMS reference manual. Something 
like
- https://dev.mysql.com/doc/refman/5.5/en/creating-tables.html
- 
https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_createtable.html
- 
https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#ADMIN01503

I prefer to having something more human friendly. The whole SQL doc needs a 
complete re-org. cc @jiangxb1987 Maybe you are the right person to take it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
I meant adding a new chapter describing options, removing duplication, for 
example here 

https://github.com/apache/spark/blob/73d80ec49713605d6a589e688020f0fc2d6feab2/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L513
and then leaving  a link to the new chapter instead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19485
  
@HyukjinKwon I did not understand what is your suggestion. 

@jomach Any reason you closed this PR or you plan to open a new one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
Could it be an option to leave a link back to the new page in the API doc 
to refer the options and remove the option list in API doc @gatorsmile and 
@liancheng?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
My only worry is duplication and we would have another place to update the 
doc for options. Others sound okay to me too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19485
  
Appreciate it. Thanks! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-18 Thread jomach
Github user jomach commented on the issue:

https://github.com/apache/spark/pull/19485
  
@gatorsmile will do


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19485
  
Just checked it with @liancheng Both think creating a separate page sounds 
good.

Also cc @rxin  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-18 Thread jomach
Github user jomach commented on the issue:

https://github.com/apache/spark/pull/19485
  
@gatorsmile: we will have a lot of duplication.

Ist that Fine ? I will create a complete new Page like sql programming 
guide and name it Data sources guide and add all the data sources with all the 
options (and duplicating information from the api into the docs) ist that ok 
for all ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19485
  
@jomach and @HyukjinKwon 

I did not generate the doc. I think we should follow what we did for JDBC. 
http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases

List all the public options for each built-in data sources. Thus, it makes 
sense to add a new chapter for CSV



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
@gatorsmile WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-17 Thread jomach
Github user jomach commented on the issue:

https://github.com/apache/spark/pull/19485
  
Yes I'm viewing the  docs with Jekyll.  I addressed that  on my previous 
comment. I really don't think we should make a huge example as the json does. 
It's a csv ... 

What do you think ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
Less duplication is good but could we similar contents with 
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets? 
It looks the examples are quite different.

Also, up to my knowledge, we can shorten the link to, for example, 
`api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame`
 (not tested).

You could check the HTML by following 
https://github.com/apache/spark/tree/master/docs#prerequisites.  Adding a new 
chapter is actually not quite trivial, IMHO. Let's put our efforts here 
together.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-15 Thread jomach
Github user jomach commented on the issue:

https://github.com/apache/spark/pull/19485
  
So I removed the duplicated stuff and added the links. I do it on purpose 
not to add more example as the document is getting huge and hard to find stuff. 
What do you think ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
Yup, I think that's what I initially intended in the JIRA. Not sure for the 
iframe idea, for now. I'd just make it simple like with links.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-14 Thread jomach
Github user jomach commented on the issue:

https://github.com/apache/spark/pull/19485
  
Ok so I will do: 
  - Create a new Section for csv-datasets
  - add more  example options on the code fromJavaSQLDataSourceExample.java 
(.scala .py and .r)
  - Make reference to the links from the api. 

This will have the effect that we will not see all the options on .md page 
and people will need to jump in to the api. Do you agree with this ? 

Cool would be if from jekyllrb we could create something like a iframe and 
get the options from the scala api... Any ideias ? 

Please net me know if it is ok to proceed this way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
Thanks for taking a look for this one. Actually, I thought we should add a 
chapter like 
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets

And, add a link to, for example, 
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv
 for Python, 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame
 for Scala and 
http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv-scala.collection.Seq-
 for Java to refer the options, rather than duplicating the option list (which 
we should duplicately update when we fix or add options).

Probably, we should add some links to JSON ones too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-13 Thread jomach
Github user jomach commented on the issue:

https://github.com/apache/spark/pull/19485
  
@HyukjinKwon I came up with this. What do you think ? What I don't like on 
it is that I did not find anyway to read Javadocs into the markdown so that we 
don't have duplicates. Any ideia or should we leave it as in this PR ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
Sure, please take your time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-12 Thread jomach
Github user jomach commented on the issue:

https://github.com/apache/spark/pull/19485
  
Yes I will do it. give me some days please. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
Oh, @jomach, I had to be clear. I actually left it so that followup 
addressing https://github.com/apache/spark/pull/19429#issuecomment-335732059 
could fix this newline issue together. Would you be willing to address that 
comment too here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19485
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-12 Thread jomach
Github user jomach commented on the issue:

https://github.com/apache/spark/pull/19485
  
@HyukjinKwon  Here is the enter as the other is closed / merged


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org