[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Thanks for explanation. I guess there would be a big doc change soon? Will check those changes too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19485 The reference manual and API docs are different. Below is a link of DB2 LUW: http://www-01.ibm.com/support/docview.wss?uid=swg27038855 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 @gatorsmile, sure, detailed doc is great and defintely I support it. Just one thing I am worried of is duplication. If we add or change option, we have to update those together and .. you know it. Wouldn't it be nicer if we simply leave a pointer and remove the duplication if possible? If I understood correctly, the options would also be described in more details in the future in the new chapter and I think simpliy redirecting it might be feasible. I guess it shouldn't be too difficult to make a sub-chapter for options only, for example, like http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options Otherwise, would you maybe thimk there should be dfferent contents for a different purpose, or want to leave the duplication just for now as something to be fixed soon? If so, I am okay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19485 Sure, I'll be working on this for this weekend. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19485 This is the API link you refer `https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame` I just quickly scanned them. The option descriptions are pretty rough. They are made for advanced dev who the read API docs and play with them. In the long term, we should follow what the mainstream RDBMS reference manual. Something like - https://dev.mysql.com/doc/refman/5.5/en/creating-tables.html - https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_createtable.html - https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#ADMIN01503 I prefer to having something more human friendly. The whole SQL doc needs a complete re-org. cc @jiangxb1987 Maybe you are the right person to take it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 I meant adding a new chapter describing options, removing duplication, for example here https://github.com/apache/spark/blob/73d80ec49713605d6a589e688020f0fc2d6feab2/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L513 and then leaving a link to the new chapter instead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19485 @HyukjinKwon I did not understand what is your suggestion. @jomach Any reason you closed this PR or you plan to open a new one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Could it be an option to leave a link back to the new page in the API doc to refer the options and remove the option list in API doc @gatorsmile and @liancheng? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 My only worry is duplication and we would have another place to update the doc for options. Others sound okay to me too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19485 Appreciate it. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19485 @gatorsmile will do --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19485 Just checked it with @liancheng Both think creating a separate page sounds good. Also cc @rxin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19485 @gatorsmile: we will have a lot of duplication. Ist that Fine ? I will create a complete new Page like sql programming guide and name it Data sources guide and add all the data sources with all the options (and duplicating information from the api into the docs) ist that ok for all ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19485 @jomach and @HyukjinKwon I did not generate the doc. I think we should follow what we did for JDBC. http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases List all the public options for each built-in data sources. Thus, it makes sense to add a new chapter for CSV --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 @gatorsmile WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19485 Yes I'm viewing the docs with Jekyll. I addressed that on my previous comment. I really don't think we should make a huge example as the json does. It's a csv ... What do you think ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Less duplication is good but could we similar contents with http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets? It looks the examples are quite different. Also, up to my knowledge, we can shorten the link to, for example, `api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame` (not tested). You could check the HTML by following https://github.com/apache/spark/tree/master/docs#prerequisites. Adding a new chapter is actually not quite trivial, IMHO. Let's put our efforts here together. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19485 So I removed the duplicated stuff and added the links. I do it on purpose not to add more example as the document is getting huge and hard to find stuff. What do you think ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Yup, I think that's what I initially intended in the JIRA. Not sure for the iframe idea, for now. I'd just make it simple like with links. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19485 Ok so I will do: - Create a new Section for csv-datasets - add more example options on the code fromJavaSQLDataSourceExample.java (.scala .py and .r) - Make reference to the links from the api. This will have the effect that we will not see all the options on .md page and people will need to jump in to the api. Do you agree with this ? Cool would be if from jekyllrb we could create something like a iframe and get the options from the scala api... Any ideias ? Please net me know if it is ok to proceed this way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Thanks for taking a look for this one. Actually, I thought we should add a chapter like http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets And, add a link to, for example, https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv for Python, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame for Scala and http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv-scala.collection.Seq- for Java to refer the options, rather than duplicating the option list (which we should duplicately update when we fix or add options). Probably, we should add some links to JSON ones too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19485 @HyukjinKwon I came up with this. What do you think ? What I don't like on it is that I did not find anyway to read Javadocs into the markdown so that we don't have duplicates. Any ideia or should we leave it as in this PR ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Sure, please take your time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19485 Yes I will do it. give me some days please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Oh, @jomach, I had to be clear. I actually left it so that followup addressing https://github.com/apache/spark/pull/19429#issuecomment-335732059 could fix this newline issue together. Would you be willing to address that comment too here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19485 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19485 @HyukjinKwon Here is the enter as the other is closed / merged --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org