Re: Help with loading a CSV using Spark-SQL & Spark-CSV

Ryan Wed, 30 Sep 2015 09:52:19 -0700

Any updates on this? Or perhaps tutorials that successfully integrate
spark-csv into Zeppelin? If I can rule out the code as the problem I can
start looking into the install to see what's going wrong.


Thanks,
Ryan

On Tue, Sep 29, 2015 at 9:09 AM, Ryan <freelanceflashga...@gmail.com> wrote:

> Hi Alex,
>
> Thank you for getting back to me!
>
> The tutorial code was a bit confusing and made it seem like sqlContext was
> the proper variable to use:
> // Zeppelin creates and injects sc (SparkContext) and sqlContext
> (HiveContext or SqlContext)
>
> I tried as you mentioned, but am still getting similar errors. Here is the
> code I tried:
>
> %dep
> z.reset()
> z.load("com.databricks:spark-csv_2.10:1.2.0")
>
> %spark
> val crimeData = "hdfs://
> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv"
> sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header" ->
> "true")).registerTempTable("crimes")
>
> The %spark interpreter is binded in the settings. I clicked save again to
> make sure, then ran it again. I am getting this error:
>
> <console>:17: error: not found: value sqlc
> sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header" ->
> "true")).registerTempTable("crimes") ^ <console>:13: error: not found:
> value % %spark
> Could it be something to do with my Zeppelin installation? The tutorial
> code ran without any issues though.
>
> Thanks!
> Ryan
>
>
>
> On Mon, Sep 28, 2015 at 5:07 PM, Alexander Bezzubov <b...@apache.org>
> wrote:
>
>> Hi,
>>
>> thank you for your interested in Zeppelin!
>>
>> Couple of things I noticed: as you probably already know , %dep and
>> %spark parts should always be in separate paragraphs.
>>
>> %spark already exposes sql context though `sqlc` variable, so you better
>> use sqlc.load("...") instead.
>>
>> And of course to be able to use %spark interpreter in the notebook, you
>> need to make sure you have it binded (cog button, on the top right)
>>
>> Hope this helps!
>>
>> --
>> Kind regards,
>> Alex
>>
>>
>> On Mon, Sep 28, 2015 at 4:29 PM, Ryan <freelanceflashga...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> In a Zeppelin notebook, I am trying to load a csv using the spark-csv
>>> package by databricks. I am using the Hortonworks sandbox to run Zeppelin
>>> on. Unfortunately, the methods I have been trying have not been working.
>>>
>>> My latest attempt is:
>>> %dep
>>> z.load("com.databricks:spark-csv_2.10:1.2.0")
>>> %spark
>>> val crimeData = "hdfs://
>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv
>>> "
>>> sqlContext.load("hdfs://
>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes")
>>>
>>> This is the error I receive:
>>> <console>:16: error: not found: value sqlContext sqlContext.load("hdfs://
>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes") ^
>>> <console>:12: error: not found: value % %spark ^
>>> Thank you for any help in advance,
>>> Ryan
>>>
>>
>>
>

Re: Help with loading a CSV using Spark-SQL & Spark-CSV

Reply via email to