Re: Spark 2.0.2 - JdbcRelationProvider does not allow create table as select

2017-07-07 Thread Kanagha Kumar
Hi all, Bumping it again! Please let me know if anyone has faced this in 2.0.x versions. I am using spark 2.0.2 for runtime. Based on the comments, I will open a bug if necessary. Thanks! On Thu, Jul 6, 2017 at 4:00 PM, Kanagha Kumar <kpra...@salesforce.com> wrote: > Hi, > > I'

Spark 2.0.2 - JdbcRelationProvider does not allow create table as select

2017-07-06 Thread Kanagha Kumar
Hi, I'm running spark 2.0.2 version and I'm noticing an issue with DataFrameWriter.save() Code: ds.write().format("jdbc").mode("overwrite").options(ImmutableMap.of( "driver", "org.apache.phoenix.jdbc.PhoenixDriver", "url", urlWithTenant,

Re: Error while doing mvn release for spark 2.0.2 using scala 2.10

2017-06-21 Thread Kanagha Kumar
} org.scala-lang *common/sketch/pom.xml* org.apache.spark spark-tags_${scala.binary.version} On Mon, Jun 19, 2017 at 2:25 PM, Kanagha Kumar <kpra...@salesforce.com> wrote: > Thanks. But, I am required to do a maven release to Nexus on sp

Spark submit - org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run error

2017-06-22 Thread Kanagha Kumar
Hi, I am *intermittently* seeing this error while doing spark-submit for spark 2.0.2-scala 2.11 version. I see the same issue reported in https://issues.apache.org/jira/browse/SPARK-18343 and it seems to be RESOLVED. I can run successfully most of the time though. Hence I'm unsure if it is

Re: Error while doing mvn release for spark 2.0.2 using scala 2.10

2017-06-19 Thread Kanagha Kumar
projects (such as spark-tags) are Java projects. Spark doesn't > fix the artifact name and just hard-core 2.11. > > For your issue, try to use `install` rather than `package`. > > On Sat, Jun 17, 2017 at 7:20 PM, Kanagha Kumar <kpra...@salesforce.com> > wrote: > >> Hi, >

Error while doing mvn release for spark 2.0.2 using scala 2.10

2017-06-16 Thread Kanagha Kumar
Hey all, I'm trying to use Spark 2.0.2 with scala 2.10 by following this https://spark.apache.org/docs/2.0.2/building-spark.html#building-for-scala-210 ./dev/change-scala-version.sh 2.10 ./build/mvn -Pyarn -Phadoop-2.4 -Dscala-2.10 -DskipTests clean package I could build the distribution

Re: Error while doing mvn release for spark 2.0.2 using scala 2.10

2017-06-17 Thread Kanagha Kumar
Hi, Bumping up again! Why does spark modules depend upon scala2.11 versions inspite of changing pom.xmls using ./dev/change-scala-version.sh 2.10. Appreciate any quick help!! Thanks On Fri, Jun 16, 2017 at 2:59 PM, Kanagha Kumar <kpra...@salesforce.com> wrote: > Hey all, > &g

Reading from HDFS by increasing split size

2017-10-10 Thread Kanagha Kumar
Hi, I'm trying to read a 60GB HDFS file using spark textFile("hdfs_file_path", minPartitions). How can I control the no.of tasks by increasing the split size? With default split size of 250 MB, several tasks are created. But I would like to have a specific no.of tasks created while reading from

Re: Reading from HDFS by increasing split size

2017-10-10 Thread Kanagha Kumar
e or split the file yourself >> beforehand (not recommended). >> >> > On 10. Oct 2017, at 09:14, Kanagha Kumar <kpra...@salesforce.com> >> wrote: >> > >> > Hi, >> > >> > I'm trying to read a 60GB HDFS file using spark >> textFile("

Re: Reading from HDFS by increasing split size

2017-10-10 Thread Kanagha Kumar
the Hadoop web > page should tell you ;-) > > On 10. Oct 2017, at 17:53, Kanagha Kumar <kpra...@salesforce.com> wrote: > > Thanks for the inputs!! > > I passed in spark.mapred.max.split.size, spark.mapred.min.split.size to > the size I wanted to rea

Replicating a row n times

2017-09-28 Thread Kanagha Kumar
Hi, I'm trying to replicate a single row from a dataset n times and create a new dataset from it. But, while replicating I need a column's value to be changed for each replication since it would be end up as the primary key when stored finally. Looked at the following reference:

Re: Replicating a row n times

2017-09-29 Thread Kanagha Kumar
ayan guha <guha.a...@gmail.com> wrote: > How about using row number for primary key? > > Select row_number() over (), * from table > > On Fri, 29 Sep 2017 at 10:21 am, Kanagha Kumar <kpra...@salesforce.com> > wrote: > >> Hi, >> >> I'm trying to repl

Error - Spark reading from HDFS via dataframes - Java

2017-09-30 Thread Kanagha Kumar
Hi, I'm trying to read data from HDFS in spark as dataframes. Printing the schema, I see all columns are being read as strings. I'm converting it to RDDs and creating another dataframe by passing in the correct schema ( how the rows should be interpreted finally). I'm getting the following