Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-09-30 Thread Marco Mistroni
Hi all this problem is still bothering me. Here's my setup - Ubuntu 16.06 - Java 8 - Spark 2.0 - have launched following command: ./build/mvn -X -Pyarn -Phadoop-2.7 -DskipTests clean package and i am gettign this exception: org.apache.maven.lifecycle.LifecycleExecutionException: Fail

Re: Spark 2.0 issue

2016-09-29 Thread Xiao Li
Hi, Ashish, Will take a look at this soon. Thanks for reporting this, Xiao 2016-09-29 14:26 GMT-07:00 Ashish Shrowty : > If I try to inner-join two dataframes which originated from the same initial > dataframe that was loaded using spark.sql() call, it results in an error - > > // reading f

Spark 2.0 issue

2016-09-29 Thread Ashish Shrowty
If I try to inner-join two dataframes which originated from the same initial dataframe that was loaded using spark.sql() call, it results in an error - // reading from Hive .. the data is stored in Parquet format in Amazon S3 val d1 = spark.sql("select * from ") val df1 = d1.groupBy("

Question about single/multi-pass execution in Spark-2.0 dataset/dataframe

2016-09-27 Thread Spark User
with batch interval in the order of 5 - 10 mins 2) Data set is large in the order of millions of records per batch 3) I'm using spark 2.0 The above implementation doesn't seem to be efficient at all, if data set goes through the Rows for every count aggregation for computing attr1Co

Re: Tutorial error - zeppelin 0.6.2 built with spark 2.0 and mapr

2016-09-26 Thread Nirav Patel
FYI, it works when I use MapR configured Spark 2.0. ie export SPARK_HOME=/opt/mapr/spark/spark-2.0.0-bin-without-hadoop Thanks Nirav On Mon, Sep 26, 2016 at 3:45 PM, Nirav Patel wrote: > Hi, > > I built zeppeling 0.6 branch using spark 2.0 using following mvn : > > mvn clean

Tutorial error - zeppelin 0.6.2 built with spark 2.0 and mapr

2016-09-26 Thread Nirav Patel
Hi, I built zeppeling 0.6 branch using spark 2.0 using following mvn : mvn clean package -Pbuild-distr -Pmapr41 -Pyarn -Pspark-2.0 -Pscala-2.11 -DskipTests Built went successful. I only have following set in zeppelin-conf.sh export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.5.1/ export

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-26 Thread Koert Kuipers
;> >>>>> What type cluster you are running on? YARN? And what distribution? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" <

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-26 Thread Koert Kuipers
? And what distribution? >>>> >>>> >>>> >>>> >>>> >>>> On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" < >>>> hol...@pigscanfly.ca> wrote: >>>> >>>> You really shouldn&

Re: Spark 2.0 Structured Streaming: sc.parallelize in foreach sink cause Task not serializable error

2016-09-26 Thread Michael Armbrust
eaming in Spark > 2.0. I use the Structured Streaming to perform windowing by event time. I > can print out the result in the console. I would like to write the result > to Cassandra database through the foreach sink option. I am trying to use > the spark-cassandra-connector to

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-26 Thread Piotr Smoliński
>>>> What type cluster you are running on? YARN? And what distribution? >>>> >>>> >>>> >>>> >>>> >>>> On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" < >>>> hol...@pigscanf

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-26 Thread Rex X
gt;> >>> >>> >>> On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" < >>> hol...@pigscanfly.ca> wrote: >>> >>> You really shouldn't mix different versions of Spark between the master >>> and worker nodes, if your

Spark 2.0 Structured Streaming: sc.parallelize in foreach sink cause Task not serializable error

2016-09-25 Thread Jianshi
Dear all: I am trying out the new released feature of structured streaming in Spark 2.0. I use the Structured Streaming to perform windowing by event time. I can print out the result in the console. I would like to write the result to Cassandra database through the foreach sink option. I am

Bizarre behavior using Datasets/ML on Spark 2.0

2016-09-21 Thread Miles Crawford
Hello folks. I recently migrated my application to Spark 2.0, and everything worked well, except for one function that uses "toDS" and the ML libraries. This stage used to complete in 15 minutes or so on 1.6.2, and now takes almost two hours. The UI shows very strange behavior - comple

Re: is there any bug for the configuration of spark 2.0 cassandra spark connector 2.0 and cassandra 3.0.8

2016-09-20 Thread Todd Nist
These types of questions would be better asked on the user mailing list for the Spark Cassandra connector: http://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user Version compatibility can be found here: https://github.com/datastax/spark-cassandra-connector#version-compa

is there any bug for the configuration of spark 2.0 cassandra spark connector 2.0 and cassandra 3.0.8

2016-09-19 Thread muhammet pakyürek
please tell me the configuration including the most recent version of cassandra, spark and cassandra spark connector

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-18 Thread Felix Cheung
ature of data frame which is available since Spark 1.6. But the spark of current cluster is version 1.5. Can we install Spark 2.0 on the master node to work around this? Thanks! -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau -- Cell : 425-233-8271 Twitter: https://twitter

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-18 Thread Chris Fregly
park between the master >> and worker nodes, if your going to upgrade - upgrade all of them. Otherwise >> you may get very confusing failures. >> >> On Monday, September 5, 2016, Rex X wrote: >> >>> Wish to use the Pivot Table feature of data frame which is availabl

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-10 Thread Holden Karau
> On Monday, September 5, 2016, Rex X > wrote: > >> Wish to use the Pivot Table feature of data frame which is available >> since Spark 1.6. But the spark of current cluster is version 1.5. Can we >> install Spark 2.0 on the master node to work around this? >> >

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-10 Thread Felix Cheung
6. But the spark of current cluster is version 1.5. Can we install Spark 2.0 on the master node to work around this? Thanks! -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau

Re: Spark 2.0 with Kafka 0.10 exception

2016-09-07 Thread Cody Koeninger
; >> >> > >> >> > org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193) >> >> > >> >> > 16/09/07 16:00:02 INFO CoarseGrainedExecutorBackend: Got assigned >> >> > task >> >> &g

Re: Spark 2.0 with Kafka 0.10 exception

2016-09-07 Thread Srikanth
umer: Initial fetch for > >> > spark-executor-StreamingPixelCount1 mt_event 0 57098866 > >> > > >> > 16/09/07 16:00:03 INFO Executor: Finished task 1.1 in stage 138.0 (TID > >> > 7854). 1103 bytes result sent to driver > >> > > >&

Re: Spark 2.0 with Kafka 0.10 exception

2016-09-07 Thread Cody Koeninger
; > On Wed, Aug 24, 2016 at 2:13 PM, Srikanth wrote: >> >> >> >> Thanks Cody. Setting poll timeout helped. >> >> Our network is fine but brokers are not fully provisioned in test >> >> cluster. >> >> But there isn't enough load to ma

Re: Spark 2.0 with Kafka 0.10 exception

2016-09-07 Thread Srikanth
at running on the same node doesn't have any issues. > >> > >> Srikanth > >> > >> On Tue, Aug 23, 2016 at 9:52 PM, Cody Koeninger > >> wrote: > >>> > >>> You can set that poll timeout higher with > >>> > >&

Re: Spark 2.0 with Kafka 0.10 exception

2016-09-07 Thread Cody Koeninger
Koeninger >> wrote: >>> >>> You can set that poll timeout higher with >>> >>> spark.streaming.kafka.consumer.poll.ms >>> >>> but half a second is fairly generous. I'd try to take a look at >>> what's going on with your n

Re: Spark 2.0 with Kafka 0.10 exception

2016-09-07 Thread Srikanth
a.consumer.poll.ms >> >> but half a second is fairly generous. I'd try to take a look at >> what's going on with your network or kafka broker during that time. >> >> On Tue, Aug 23, 2016 at 4:44 PM, Srikanth wrote: >> > Hello, >> > >&g

Managing Dataset API Partitions - Spark 2.0

2016-09-07 Thread ANDREA SPINA
Hi everyone, I'd test some algorithms with the Dataset API offered by Spark 2.0.0. So I was wondering, *which is the best way for managing Dataset partitions?* E.g. in the data reading phase, what I use to do is the following *// RDD* *// if I want to set a custom minimum number of partitions* *v

Re[10]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-07 Thread Сергей Романов
35 seconds) With -XX:-DontCompileHugeMethods: 30 rows selected (1.086 seconds) 30 rows selected (1.051 seconds) 30 rows selected (1.073 seconds) >Среда, 7 сентября 2016, 0:35 +03:00 от Yong Zhang : > >This is an interesting point. > >I tested with originally data with Spark 2.0 re

Re: Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-06 Thread Yong Zhang
This is an interesting point. I tested with originally data with Spark 2.0 release, I can get the same statistic output in the originally email like following: 50 1.77695393562 51 0.695149898529 52 0.638142108917 53 0.647341966629 54 0.663456916809 55 0.629166126251 56 0.644149065018 57

Re: Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-06 Thread Davies Liu
I think the slowness is caused by generated aggregate method has more than 8K bytecodes, than it's not JIT compiled, became much slower. Could you try to disable the DontCompileHugeMethods by: -XX:-DontCompileHugeMethods On Mon, Sep 5, 2016 at 4:21 AM, Сергей Романов wrote: > Hi, Gavin, > > Shu

Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-05 Thread Сергей Романов
Hi, Gavin, Shuffling is exactly the same in both requests and is minimal. Both requests produces one shuffle task. Running time is the only difference I can see in metrics: timeit.timeit(spark.read.csv('file:///data/dump/test_csv', schema=schema).groupBy().sum(*(['dd_convs'] * 57) ).collect,

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-04 Thread Holden Karau
which is available since > Spark 1.6. But the spark of current cluster is version 1.5. Can we install > Spark 2.0 on the master node to work around this? > > Thanks! > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau

Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-04 Thread Rex X
Wish to use the Pivot Table feature of data frame which is available since Spark 1.6. But the spark of current cluster is version 1.5. Can we install Spark 2.0 on the master node to work around this? Thanks!

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-09-03 Thread Diwakar Dhanuskodi
Please run with -X and post logs here. We can get exact error from it. On Sat, Sep 3, 2016 at 7:24 PM, Marco Mistroni wrote: > hi all > > i am getting failures when building spark 2.0 on Ubuntu 16.06 > Here's details of what i have installed on the ubuntu host > - j

Re: Re[6]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-03 Thread Gavin Yue
Any shuffling? > On Sep 3, 2016, at 5:50 AM, Сергей Романов wrote: > > Same problem happens with CSV data file, so it's not parquet-related either. > > > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ vers

Re: Catalog, SessionCatalog and ExternalCatalog in spark 2.0

2016-09-03 Thread Kapil Malik
achieved by extending HiveContext, and correspondingly >> HiveCatalog. I have my own implementation of trait "Catalog", which >> over-rides the "lookupRelation" method to do the magic behind the scenes. >> >> However, in spark 2.0, I can see foll

Re: Catalog, SessionCatalog and ExternalCatalog in spark 2.0

2016-09-03 Thread Raghavendra Pandey
og. I have my own implementation of trait "Catalog", which > over-rides the "lookupRelation" method to do the magic behind the scenes. > > However, in spark 2.0, I can see following - > SessionCatalog - which contains lookupRelation method, but doesn't have > any in

Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-09-03 Thread Marco Mistroni
hi all i am getting failures when building spark 2.0 on Ubuntu 16.06 Here's details of what i have installed on the ubuntu host - java 8 - scala 2.11 - git When i launch the command ./build/mvn -Pyarn -Phadoop-2.7 -DskipTests clean package everything compiles sort of fine and at the

Re[7]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-03 Thread Сергей Романов
And even more simple case: >>> df = sc.parallelize([1] for x in xrange(760857)).toDF() >>> for x in range(50, 70): print x, timeit.timeit(df.groupBy().sum(*(['_1'] * >>> x)).collect, number=1) 50 1.91226291656 51 1.50933384895 52 1.582903862 53 1.90537405014 54 1.84442877769 55 1.9177978 56

Re[6]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-03 Thread Сергей Романов
Same problem happens with CSV data file, so it's not parquet-related either. Welcome to     __ / __/__  ___ _/ /__     _\ \/ _ \/ _ `/ __/  '_/    /__ / .__/\_,_/_/ /_/\_\   version 2.0.0   /_/ Using Python version 2.7.6 (default, Jun 22 2015 17:58:13) SparkSessi

Re[5]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-03 Thread Сергей Романов
Hi, I had narrowed down my problem to a very simple case. I'm sending 27kb parquet in attachment. (file:///data/dump/test2 in example) Please, can you take a look at it? Why there is performance drop after 57 sum columns? Welcome to     __ / __/__  ___ _/ /__     _\

Catalog, SessionCatalog and ExternalCatalog in spark 2.0

2016-09-03 Thread Kapil Malik
ame 4. Register it as temp table (for future calls to this table) This is achieved by extending HiveContext, and correspondingly HiveCatalog. I have my own implementation of trait "Catalog", which over-rides the "lookupRelation" method to do the magic behind the scenes. However

Re[4]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-03 Thread Сергей Романов
gt; 13.405s >> >> >>>Четверг, 1 сентября 2016, 19:35 +03:00 от Mich Talebzadeh < >>>mich.talebza...@gmail.com >: >>> >>> >>>What happens if you run the following query on its own. How long it takes? >>> >>>SELECT field, 

Re: Re[2]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-02 Thread Mich Talebzadeh
content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 1 September 2016 at 16:55, Сергей Романов > wrote: > > Hi, > > When I run a query like "SELECT field, SUM(

Re[2]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-02 Thread Сергей Романов
SUM(x28), SUM(x29) FROM parquet_table WHERE partition = 1 >>GROUP BY field" it runs in about 12 seconds. >> >>Why does it happens? Can I make second query run as fast as first one? I >>tried browsing logs in TRACE mode and comparing CODEGEN but everything looks

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Jakob Odersky
I'm not sure how the shepherd thing works, but just FYI Michael Armbrust originally wrote Catalyst, the engine behind Datasets. You can find a list of all committers here https://cwiki.apache.org/confluence/display/SPARK/Committers. Another good resource is to check https://spark-prs.appspot.com/

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Aris
ark-shell cannot take value > classes, that was an additional confusion to me! > > 2. We have a Spark 2.0 project which is definitely breaking at runtime > with a Dataset of value classes. I am not sure if this is also the case in > Spark 1.6, I'm going to verify. > > Onc

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Aris
Thank you Jakob on two counts 1. Yes, thanks for pointing out that spark-shell cannot take value classes, that was an additional confusion to me! 2. We have a Spark 2.0 project which is definitely breaking at runtime with a Dataset of value classes. I am not sure if this is also the case in

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Jakob Odersky
the REPL. See https://issues.apache.org/jira/browse/SPARK-17367) regards, --Jakob On Thu, Sep 1, 2016 at 1:58 PM, Aris wrote: > Hello Spark community - > > Does Spark 2.0 Datasets *not support* Scala Value classes (basically > "extends AnyVal" with a bunch of limitations) ?

Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Aris
Hello Spark community - Does Spark 2.0 Datasets *not support* Scala Value classes (basically "extends AnyVal" with a bunch of limitations) ? I am trying to do something like this: case class FeatureId(value: Int) extends AnyVal val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(

Re: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-01 Thread Mich Talebzadeh
parquet_table WHERE > partition = 1 GROUP BY field" it runs in about 12 seconds. > > Why does it happens? Can I make second query run as fast as first one? I > tried browsing logs in TRACE mode and comparing CODEGEN but everything > looks pretty much the same excluding execution ti

Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-01 Thread Сергей Романов
lated to SPARK-17115 ? I'm using Spark 2.0 Thrift Server over YARN/HDFS with partitioned parquet hive tables. Complete example using beeline: 0: jdbc:hive2://spark-master1.uslicer> DESCRIBE EXTENDED `slicer`.`573_slicer_rnd_13`; col_name,data_type,comment actual_dsp_fee,float,NUL

Re: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-01 Thread Romanov
Can this be related to SPARK-17115 ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-SQL-runs-5x-times-slower-when-adding-29th-field-to-aggregation-tp27624p27643.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: Spark 2.0 - Parquet data with fields containing periods "."

2016-08-31 Thread Don Drake
hub.com/apache/spark/pull/14339 > > On 1 Sep 2016 2:48 a.m., "Don Drake" wrote: > >> I am in the process of migrating a set of Spark 1.6.2 ETL jobs to Spark >> 2.0 and have encountered some interesting issues. >> >> First, it seems the SQL parsing is differe

Re: Spark 2.0 - Parquet data with fields containing periods "."

2016-08-31 Thread Hyukjin Kwon
Hi Don, I guess this should be fixed from 2.0.1. Please refer this PR. https://github.com/apache/spark/pull/14339 On 1 Sep 2016 2:48 a.m., "Don Drake" wrote: > I am in the process of migrating a set of Spark 1.6.2 ETL jobs to Spark > 2.0 and have encountered some interesting i

Spark 2.0 - Parquet data with fields containing periods "."

2016-08-31 Thread Don Drake
I am in the process of migrating a set of Spark 1.6.2 ETL jobs to Spark 2.0 and have encountered some interesting issues. First, it seems the SQL parsing is different, and I had to rewrite some SQL that was doing a mix of inner joins (using where syntax, not inner) and outer joins to get the SQL

Re: Spark 2.0 - Join statement compile error

2016-08-30 Thread shengshanzhang
> scala> s"I'm using $spark in ${spark.version}" > res0: String = I'm using org.apache.spark.sql.SparkSession@1fc1c7e in > 2.1.0-SNAPSHOT > > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ <https://medium.com/@jaceklaskowski/> > Master

Re: Spark 2.0 - Join statement compile error

2016-08-30 Thread Jacek Laskowski
. scala> s"I'm using $spark in ${spark.version}" res0: String = I'm using org.apache.spark.sql.SparkSession@1fc1c7e in 2.1.0-SNAPSHOT Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me

Re: Spark 2.0 - Join statement compile error

2016-08-30 Thread Mich Talebzadeh
Actually I doubled checked this ‘s’ String Interpolator In Scala scala> val chars = "This is Scala" chars: String = This is Scala scala> println($"$chars") This is Scala OK so far fine. In shell (ksh) can do chars="This is Scala" print "$chars" This is Scala In Shell print "$charsand it is i

Re: Equivalent of "predict" function from LogisticRegressionWithLBFGS in OneVsRest with LogisticRegression classifier (Spark 2.0)

2016-08-29 Thread Nick Pentreath
Try this: val df = spark.createDataFrame(Seq(Vectors.dense(Array(10, 590, 190, 700))).map(Tuple1.apply)).toDF("features") On Sun, 28 Aug 2016 at 11:06 yaroslav wrote: > Hi, > > We use such kind of logic for training our model > > val model = new LogisticRegressionWithLBFGS() > .setNum

Equivalent of "predict" function from LogisticRegressionWithLBFGS in OneVsRest with LogisticRegression classifier (Spark 2.0)

2016-08-28 Thread yaroslav
Hi, We use such kind of logic for training our model val model = new LogisticRegressionWithLBFGS() .setNumClasses(3) .run(train) Next, during spark streaming, we load model and apply incoming data to this model to get specific class, for example: model.predict(Vectors.dense(1

Re: Spark 2.0 - Join statement compile error

2016-08-28 Thread Mich Talebzadeh
Yes I realised that. Actually I thought it was s not $. it has been around in shell for years say for actual values --> ${LOG_FILE}, for position 's/ etc cat ${LOG_FILE} | egrep -v 'rows affected|return status|&&&' | sed -e 's/^[]*//g' -e 's/^//g' -e '/^$/d' > temp.out Dr Mi

Re: Spark 2.0 - Join statement compile error

2016-08-28 Thread Jacek Laskowski
Hi Mich, This is Scala's string interpolation which allow for replacing $-prefixed expressions with their values. It's what cool kids use in Scala to do templating and concatenation 😁 Jacek On 23 Aug 2016 9:21 a.m., "Mich Talebzadeh" wrote: > What is --> s below before the text of sql? > >

Re: Please assist: Building Docker image containing spark 2.0

2016-08-27 Thread Marco Mistroni
s >>>>> >>>>> >>>>> RUN ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests >>>>> clean package >>>>> >>>>> kr >>>>> >>>>> >>>>> >>>>>

Re: Please assist: Building Docker image containing spark 2.0

2016-08-27 Thread Marco Mistroni
gt; >>>> kr >>>> >>>> >>>> >>>> On Fri, Aug 26, 2016 at 6:18 PM, Michael Gummelt < >>>> mgumm...@mesosphere.io> wrote: >>>> >>>>> :) >>>>> >>>>> On Thu, Aug 25, 2016 at

Re: Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Mike Metzger
pe it helps > > > > If I join the data between the 2 DFs (based on Product# and item#), I will > get a cartesion join and my result will not be what I want > > > > Thanks for your help > > > > > > *From:* Mike Metzger [mailto:m...@flexiblecreations.com] > *Sen

Re: Please assist: Building Docker image containing spark 2.0

2016-08-26 Thread Mike Metzger
ichael Gummelt >> > wrote: >>> >>>> :) >>>> >>>> On Thu, Aug 25, 2016 at 2:29 PM, Marco Mistroni >>>> wrote: >>>> >>>>> No i wont accept that :) >>>>> I can't believe i have wasted 3 h

Re: Please assist: Building Docker image containing spark 2.0

2016-08-26 Thread Michael Gummelt
>>>> >>>> kr >>>> >>>> On Thu, Aug 25, 2016 at 10:01 PM, Michael Gummelt < >>>> mgumm...@mesosphere.io> wrote: >>>> >>>>> You have a space between "build" and "mvn" >>>>>

RE: Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Subhajit Purkayastha
I want Thanks for your help From: Mike Metzger [mailto:m...@flexiblecreations.com] Sent: Friday, August 26, 2016 2:12 PM To: Subhajit Purkayastha Cc: user @spark Subject: Re: Spark 2.0 - Insert/Update to a DataFrame Without seeing exactly what you were wanting to accomplish, it&#

Re: Please assist: Building Docker image containing spark 2.0

2016-08-26 Thread Tal Grynbaum
gt;>> mgumm...@mesosphere.io> wrote: >>> >>>> You have a space between "build" and "mvn" >>>> >>>> On Thu, Aug 25, 2016 at 1:31 PM, Marco Mistroni >>>> wrote: >>>> >>>>> HI all >&

Re: Please assist: Building Docker image containing spark 2.0

2016-08-26 Thread Marco Mistroni
ll >>>> sorry for the partially off-topic, i hope there's someone on the list >>>> who has tried the same and encountered similar issuse >>>> >>>> Ok so i have created a Docker file to build an ubuntu container which >>>> inlcudes spar

Re: Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Mike Metzger
on the > sales qty (coming from the sales order DF) > > > > Hope it helps > > > > Subhajit > > > > *From:* Mike Metzger [mailto:m...@flexiblecreations.com] > *Sent:* Friday, August 26, 2016 1:13 PM > *To:* Subhajit Purkayastha > *Cc:* user @spark >

RE: Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Subhajit Purkayastha
:13 PM To: Subhajit Purkayastha Cc: user @spark Subject: Re: Spark 2.0 - Insert/Update to a DataFrame Without seeing the makeup of the Dataframes nor what your logic is for updating them, I'd suggest doing a join of the Forecast DF with the appropriate columns from the SalesOrd

Re: Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Mike Metzger
Without seeing the makeup of the Dataframes nor what your logic is for updating them, I'd suggest doing a join of the Forecast DF with the appropriate columns from the SalesOrder DF. Mike On Fri, Aug 26, 2016 at 11:53 AM, Subhajit Purkayastha wrote: > I am using spark 2.0, have 2 Da

Re: spark 2.0 home brew package missing

2016-08-26 Thread RAJESHWAR MANN
Thank you! That was it. 2.0 installed fine after the update. Regards > On Aug 26, 2016, at 1:37 PM, Noorul Islam K M wrote: > > kalkimann writes: > >> Hi, >> spark 1.6.2 is the latest brew package i can find. >> spark 2.0.x brew package is missing, best i kn

Re: spark 2.0 home brew package missing

2016-08-26 Thread Noorul Islam K M
kalkimann writes: > Hi, > spark 1.6.2 is the latest brew package i can find. > spark 2.0.x brew package is missing, best i know. > > Is there a schedule when spark-2.0 will be available for "brew install"? > Did you do a 'brew update' before searching. I i

spark 2.0 home brew package missing

2016-08-26 Thread kalkimann
Hi, spark 1.6.2 is the latest brew package i can find. spark 2.0.x brew package is missing, best i know. Is there a schedule when spark-2.0 will be available for "brew install"? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-2-0

Re: Please assist: Building Docker image containing spark 2.0

2016-08-26 Thread Michael Gummelt
ce between "build" and "mvn" >> >> On Thu, Aug 25, 2016 at 1:31 PM, Marco Mistroni >> wrote: >> >>> HI all >>> sorry for the partially off-topic, i hope there's someone on the list >>> who has tried the same and encountere

Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Subhajit Purkayastha
I am using spark 2.0, have 2 DataFrames, SalesOrder and Forecast. I need to update the Forecast Dataframe record(s), based on the SaleOrder DF record. What is the best way to achieve this functionality

Re: Please assist: Building Docker image containing spark 2.0

2016-08-25 Thread Marco Mistroni
wrote: > >> HI all >> sorry for the partially off-topic, i hope there's someone on the list >> who has tried the same and encountered similar issuse >> >> Ok so i have created a Docker file to build an ubuntu container which >> inlcudes spark 2.0, but s

Re: Please assist: Building Docker image containing spark 2.0

2016-08-25 Thread Michael Gummelt
created a Docker file to build an ubuntu container which > inlcudes spark 2.0, but somehow when it gets to the point where it has to > kick off ./build/mvn command, it errors out with the following > > ---> Running in 8c2aa6d59842 > /bin/sh: 1: ./build: Permission denied

Please assist: Building Docker image containing spark 2.0

2016-08-25 Thread Marco Mistroni
HI all sorry for the partially off-topic, i hope there's someone on the list who has tried the same and encountered similar issuse Ok so i have created a Docker file to build an ubuntu container which inlcudes spark 2.0, but somehow when it gets to the point where it has to kick off ./buil

Perform an ALS with TF-IDF output (spark 2.0)

2016-08-25 Thread Pasquinell Urbani
Hi there I am performing a product recommendation system for retail. I have been able to compute the TF-IDF of user-items data frame in spark 2.0. Now I need to transform the TF-IDF output in a data frame with columns (user_id, item_id, TF_IDF_ratings) in order to perform an ALS. But I have no

Re: Spark 2.0 with Kafka 0.10 exception

2016-08-24 Thread Srikanth
ue, Aug 23, 2016 at 4:44 PM, Srikanth wrote: > > Hello, > > > > I'm getting the below exception when testing Spark 2.0 with Kafka 0.10. > > > >> 16/08/23 16:31:01 INFO AppInfoParser: Kafka version : 0.10.0.0 > >> 16/08/23 16:31:01 INFO AppInfoParser: Kafka

Re: dynamic allocation in Spark 2.0

2016-08-24 Thread Saisai Shao
l, > > I am running hadoop 2.6.4 with Spark 2.0 and I have been trying to get > dynamic allocation to work without success. I was able to get it to work > with Spark 16.1 however. > > When I issue the command > spark-shell --master yarn --deploy-mode client > > this is

dynamic allocation in Spark 2.0

2016-08-24 Thread Shane Lee
Hello all, I am running hadoop 2.6.4 with Spark 2.0 and I have been trying to get dynamic allocation to work without success. I was able to get it to work with Spark 16.1 however. When I issue the commandspark-shell --master yarn --deploy-mode client this is the error I see: 16/08/24 00:05:40

Re: Spark 2.0 with Kafka 0.10 exception

2016-08-23 Thread Cody Koeninger
gt; I'm getting the below exception when testing Spark 2.0 with Kafka 0.10. > >> 16/08/23 16:31:01 INFO AppInfoParser: Kafka version : 0.10.0.0 >> 16/08/23 16:31:01 INFO AppInfoParser: Kafka commitId : b8642491e78c5a13 >> 16/08/23 16:31:01 INFO CachedKafkaConsumer: Initial f

Spark 2.0 with Kafka 0.10 exception

2016-08-23 Thread Srikanth
Hello, I'm getting the below exception when testing Spark 2.0 with Kafka 0.10. 16/08/23 16:31:01 INFO AppInfoParser: Kafka version : 0.10.0.0 > 16/08/23 16:31:01 INFO AppInfoParser: Kafka commitId : b8642491e78c5a13 > 16/08/23 16:31:01 INFO CachedKafkaConsumer: Initial fetch for >

Re: Spark 2.0 - Join statement compile error

2016-08-23 Thread Mich Talebzadeh
What is --> s below before the text of sql? *var* sales_order_sql_stmt =* s*"""SELECT ORDER_NUMBER , INVENTORY_ITEM_ID, ORGANIZATION_ID, from_unixtime(unix_timestamp(SCHEDULE_SHIP_DATE,'-MM-dd'), '-MM-dd') AS schedule_date FROM sales_order_demand WHERE unix_times

Re: Spark 2.0 - Join statement compile error

2016-08-22 Thread Deepak Sharma
On Tue, Aug 23, 2016 at 10:32 AM, Deepak Sharma wrote: > *val* *df** = > **sales_demand**.**join**(**product_master**,**sales_demand**.$"INVENTORY_ITEM_ID" > =**== **product_master**.$"INVENTORY_ITEM_ID",**"inner"**)* Ignore the last statement. It should look something like this: *val* *df** =

Re: Spark 2.0 - Join statement compile error

2016-08-22 Thread Deepak Sharma
Hi Subhajit Try this in your join: *val* *df** = **sales_demand**.**join**(**product_master**,**sales_demand**.$"INVENTORY_ITEM_ID" =**== **product_master**.$"INVENTORY_ITEM_ID",**"inner"**)* On Tue, Aug 23, 2016 at 2:30 AM, Subhajit Purkayastha wrote: > *All,* > > > > *I have the following dat

Re: Spark 2.0 - Join statement compile error

2016-08-22 Thread Vishal Maru
try putting join condition as String On Mon, Aug 22, 2016 at 5:00 PM, Subhajit Purkayastha wrote: > *All,* > > > > *I have the following dataFrames and the temp table. * > > > > *I am trying to create a new DF , the following statement is not compiling* > > > > *val* *df** = **sales_demand**.**j

Spark 2.0 - Join statement compile error

2016-08-22 Thread Subhajit Purkayastha
All, I have the following dataFrames and the temp table. I am trying to create a new DF , the following statement is not compiling val df = sales_demand.join(product_master,(sales_demand.INVENTORY_ITEM_ID==product_ma ster.INVENTORY_ITEM_ID),joinType="inner") What am I do

Re: Spark 2.0 regression when querying very wide data frames

2016-08-22 Thread mhornbech
3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27571.html > To unsubscribe from Spark 2.0 regression when querying very wide data frames, > click here. > NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread ponkin
I generated CSV file with 300 columns, and it seems to work fine with Spark Dataframes(Spark 2.0). I think you need to post your issue in spark-cassandra-connector community (https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user) - if you are using it. -- View this

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread ponkin
Did you try to load wide, for example, CSV file or Parquet? May be the problem is in spark-cassandra-connector not Spark itself? Are you using spark-cassandra-connector(https://github.com/datastax/spark-cassandra-connector)? -- View this message in context: http://apache-spark-user-list.10015

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread mhornbech
-spark-user-list.1001560.n3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27569.html > To unsubscribe from Spark 2.0 regression when querying very wide data frames, > click here. > NAML -- View this message in context: http://apache-spark-user-list.100156

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread ponkin
Hi, What kind of datasource do you have? CSV, Avro, Parquet? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27569.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread Sean Owen
Yes, have a look through JIRA in cases like this. https://issues.apache.org/jira/browse/SPARK-16664 On Sat, Aug 20, 2016 at 1:57 AM, mhornbech wrote: > I did some extra digging. Running the query "select column1 from myTable" I > can reproduce the problem on a frame with a single row - it occurs

Re: Spark 2.0 regression when querying very wide data frames

2016-08-19 Thread mhornbech
I did some extra digging. Running the query "select column1 from myTable" I can reproduce the problem on a frame with a single row - it occurs exactly when the frame has more than 200 columns, which smells a bit like a hardcoded limit. Interestingly the problem disappears when replacing the query

Spark 2.0 regression when querying very wide data frames

2016-08-19 Thread mhornbech
. Needless to say that 1500+ columns isn't "desirable", but that's what the client's data looks like and our preference have been to load it and normalize it through Spark. We have been waiting to see how this would work with Spark 2.0, and unfortunately the problem has

<    1   2   3   4   5   6   >