[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

pwendell Mon, 16 Jun 2014 19:11:25 -0700

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/999#discussion_r13840206
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -297,50 +328,152 @@ JavaSchemaRDD teenagers = sqlCtx.sql("SELECT name 
FROM parquetFile WHERE age >=
     <div data-lang="python"  markdown="1">
     
     {% highlight python %}
    +# sqlContext from the previous example is used in this example.
     
    -peopleTable # The SchemaRDD from the previous example.
    +schemaPeople # The SchemaRDD from the previous example.
     
     # SchemaRDDs can be saved as Parquet files, maintaining the schema 
information.
    -peopleTable.saveAsParquetFile("people.parquet")
    +schemaPeople.saveAsParquetFile("people.parquet")
     
     # Read in the Parquet file created above.  Parquet files are 
self-describing so the schema is preserved.
     # The result of loading a parquet file is also a SchemaRDD.
    -parquetFile = sqlCtx.parquetFile("people.parquet")
    +parquetFile = sqlContext.parquetFile("people.parquet")
     
     # Parquet files can also be registered as tables and then used in SQL 
statements.
     parquetFile.registerAsTable("parquetFile");
    -teenagers = sqlCtx.sql("SELECT name FROM parquetFile WHERE age >= 13 AND 
age <= 19")
    -
    +teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age >= 13 
AND age <= 19")
    +teenNames = teenagers.map(lambda p: "Name: " + p.name)
    +for teenName in teenNames.collect():
    +  print teenName
     {% endhighlight %}
     
     </div>
     
     </div>
     
    -## Writing Language-Integrated Relational Queries
    +## JSON Datasets
    +<div class="codetabs">
     
    -**Language-Integrated queries are currently only supported in Scala.**
    +<div data-lang="scala"  markdown="1">
    +Spark SQL can automatically infer the schema of a JSON dataset and load it 
as a SchemaRDD.
    +This conversion can be done using one of two methods in a SQLContext:
     
    -Spark SQL also supports a domain specific language for writing queries.  
Once again,
    -using the data from the above examples:
    +* `jsonFile` - loads data from a directory of JSON files where each line 
of the files is a JSON object.
    +* `jsonRdd` - loads data from an existing RDD where each element of the 
RDD is a string containing a JSON object.
     
     {% highlight scala %}
    +// sc is an existing SparkContext.
     val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    -import sqlContext._
    -val people: RDD[Person] = ... // An RDD of case class objects, from the 
first example.
     
    -// The following is the same as 'SELECT name FROM people WHERE age >= 10 
AND age <= 19'
    -val teenagers = people.where('age >= 10).where('age <= 19).select('name)
    +// A JSON dataset is pointed by path.
    --- End diff --
    
    "is pointed to by path"



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

Reply via email to