Re: Spark Dataframe and HIVE

2018-02-09 Thread Gourav Sengupta
Hi Ravi, can you please post the entire code? Regards, Gourav On Fri, Feb 9, 2018 at 3:39 PM, Patrick Alwell wrote: > Might sound silly, but are you using a Hive context? > > What errors do the Hive query results return? > > > > spark =

Spark Dataframe and HIVE

2018-02-09 Thread रविशंकर नायर
All, It has been three days continuously I am on this issue. Not getting any clue. Environment: Spark 2.2.x, all configurations are correct. hive-site.xml is in spark's conf. 1) Step 1: I created a data frame DF1 reading a csv file. 2) Did manipulations on DF1. Resulting frame is passion_df.

Re: Spark Dataframe and HIVE

2018-02-09 Thread रविशंकर नायर
An update: (Sorry I missed) When I do passion_df.createOrReplaceTempView("sampleview") spark.sql("create table sample table as select * from sample view") Now, I can see table and can query as well. So why this do work from Spark and other method discussed below is not? Thanks On Fri, Feb

Re: Spark Dataframe and HIVE

2018-02-09 Thread Prakash Joshi
Ravi, Can you send the result of Show create table your_table_name Thanks Prakash On Feb 9, 2018 8:20 PM, "☼ R Nair (रविशंकर नायर)" < ravishankar.n...@gmail.com> wrote: > All, > > It has been three days continuously I am on this issue. Not getting any > clue. > > Environment: Spark 2.2.x, all

Re: PySpark Tweedie GLM

2018-02-09 Thread Bryan Cutler
Can you provide some code/data to reproduce the problem? On Fri, Feb 9, 2018 at 9:42 AM, nhamwey wrote: > I am using Spark 2.2.0 through Python. > > I am repeatedly getting a zero weight of sums error when trying to run a > model. This happens even when I do not

Re: Spark Dataframe and HIVE

2018-02-09 Thread Nicholas Hakobian
Its possible that the format of your table is not compatible with your version of hive, so Spark saved it in a way such that only Spark can read it. When this happens it prints out a very visible warning letting you know this has happened. We've seen it most frequently when trying to save a

NullPointerException issue in LDA.train()

2018-02-09 Thread Kevin Lam
heavily followed the code outlined here: http://sean.lane.sh/blog/2016/PySpark_and_LDA Any ideas or help is appreciated!! Thanks in advance, Kevin Example trace of output: 16:22:55 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 8.0 in >> stage 42.0 (TID 16163, >> royal

PySpark Tweedie GLM

2018-02-09 Thread nhamwey
I am using Spark 2.2.0 through Python. I am repeatedly getting a zero weight of sums error when trying to run a model. This happens even when I do not specify a defined weightCol = "variable" Py4JJavaError: An error occurred while calling o1295.fit. : java.lang.AssertionError: assertion failed:

Re: Spark Dataframe and HIVE

2018-02-09 Thread Patrick Alwell
Might sound silly, but are you using a Hive context? What errors do the Hive query results return? spark = SparkSession.builder.enableHiveSupport().getOrCreate() The second part of your questions, you are creating a temp table and then subsequently creating another table from that temp view.

[Structured Streaming] Commit protocol to move temp files to dest path only when complete, with code

2018-02-09 Thread Dave Cameron
Hi I have a Spark structured streaming job that reads from Kafka and writes parquet files to Hive/HDFS. The files are not very large, but the Kafka source is noisy so each spark job takes a long time to complete. There is a significant window during which the parquet files are incomplete and

Re: [Structured Streaming] Commit protocol to move temp files to dest path only when complete, with code

2018-02-09 Thread Michael Armbrust
We didn't go this way initially because it doesn't work on storage systems that have weaker guarantees than HDFS with respect to rename. That said, I'm happy to look at other options if we want to make this configurable. On Fri, Feb 9, 2018 at 2:53 PM, Dave Cameron

Re: [Structured Streaming] Deserializing avro messages from kafka source using schema registry

2018-02-09 Thread Michael Armbrust
This isn't supported yet, but there is on going work at spark-avro to enable this use case. Stay tuned. On Fri, Feb 9, 2018 at 3:07 PM, Bram wrote: > Hi, > > I couldn't find any documentation about avro message

Re: ML:One vs Rest with crossValidator for multinomial in logistic regression

2018-02-09 Thread Nicolas Paris
Brian This is absolutely this problem. Good to hear it will be fix in 2.3 release Le 09 févr. 2018 à 02:17, Bryan Cutler écrivait : > Nicolas, are you referring to printing the model params in that example with > "print(model1.extractParamMap())"?  There was a problem with pyspark models >

[Structured Streaming] Deserializing avro messages from kafka source using schema registry

2018-02-09 Thread Bram
Hi, I couldn't find any documentation about avro message deserialization using pyspark structured streaming. My aim is using confluent schema registry to get per topic schema then parse the avro messages with it. I found one but it was using DirectStream approach

Unsubscribe

2018-02-09 Thread wangsan
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

H2O ML use

2018-02-09 Thread Mich Talebzadeh
Hi, Has anyone had experience of using the enterprise version of H2O by any chance? How does it compare with other tools like Cloudera Data Science Workbench please? thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw