Hello

I downloaded Apache Spark pre built for Hadoop 2.6
<http://www.apache.org/dyn/closer.lua/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz>.
When I create a table, an empty directory with the same name is created
in /user/hive/warehouse. I created tables with the following kind of
statement:

> create table aTable (aColumn string)


When I place text files in the directory eg.
/user/hive/warehouse/atable/text-file, I can query the contents with
"select * from aTable" for example. When I create a table with the
following I can only query the specified file (/path/to/json/file):

> CREATE TABLE jsonTable USING org.apache.spark.sql.json OPTIONS ( path
> "/path/to/json/file" )


A directory eg. /user/hive/warehouse/jsontable is created, but if I put
files in there queries do not access the contents of those files. Is this
related to managed versus external tables or why is this?

Tables created with USING org.apache.spark.sql.json... are external tables
and tables created by specifying columns are managed. How do you make a
managed table in the same way the external tables are created above ie.
without specifying columns and instead creating columns based on JSON
content? I would expect queries on the managed table to give access to data
in files after the files are put in the managed table directory as I have
seen on managed tables I have created so far.

Thanks very much

Brendan

Reply via email to