Hi all
this problem is still bothering me.
Here's my setup
- Ubuntu 16.06
- Java 8
- Spark 2.0
- have launched following command: ./build/mvn -X -Pyarn -Phadoop-2.7
-DskipTests clean package
and i am gettign this exception:
org.apache.maven.lifecycle.LifecycleExecutionException: Fail
Hi, Ashish,
Will take a look at this soon.
Thanks for reporting this,
Xiao
2016-09-29 14:26 GMT-07:00 Ashish Shrowty :
> If I try to inner-join two dataframes which originated from the same initial
> dataframe that was loaded using spark.sql() call, it results in an error -
>
> // reading f
If I try to inner-join two dataframes which originated from the same initial
dataframe that was loaded using spark.sql() call, it results in an error -
// reading from Hive .. the data is stored in Parquet format in Amazon
S3
val d1 = spark.sql("select * from ")
val df1 =
d1.groupBy("
with batch interval in
the order of 5 - 10 mins
2) Data set is large in the order of millions of records per batch
3) I'm using spark 2.0
The above implementation doesn't seem to be efficient at all, if data set
goes through the Rows for every count aggregation for computing
attr1Co
FYI, it works when I use MapR configured Spark 2.0. ie
export SPARK_HOME=/opt/mapr/spark/spark-2.0.0-bin-without-hadoop
Thanks
Nirav
On Mon, Sep 26, 2016 at 3:45 PM, Nirav Patel wrote:
> Hi,
>
> I built zeppeling 0.6 branch using spark 2.0 using following mvn :
>
> mvn clean
Hi,
I built zeppeling 0.6 branch using spark 2.0 using following mvn :
mvn clean package -Pbuild-distr -Pmapr41 -Pyarn -Pspark-2.0 -Pscala-2.11
-DskipTests
Built went successful.
I only have following set in zeppelin-conf.sh
export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.5.1/
export
;>
>>>>> What type cluster you are running on? YARN? And what distribution?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" <
? And what distribution?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" <
>>>> hol...@pigscanfly.ca> wrote:
>>>>
>>>> You really shouldn&
eaming in Spark
> 2.0. I use the Structured Streaming to perform windowing by event time. I
> can print out the result in the console. I would like to write the result
> to Cassandra database through the foreach sink option. I am trying to use
> the spark-cassandra-connector to
>>>> What type cluster you are running on? YARN? And what distribution?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" <
>>>> hol...@pigscanf
gt;>
>>>
>>>
>>> On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" <
>>> hol...@pigscanfly.ca> wrote:
>>>
>>> You really shouldn't mix different versions of Spark between the master
>>> and worker nodes, if your
Dear all:
I am trying out the new released feature of structured streaming in Spark
2.0. I use the Structured Streaming to perform windowing by event time. I
can print out the result in the console. I would like to write the result
to Cassandra database through the foreach sink option. I am
Hello folks. I recently migrated my application to Spark 2.0, and
everything worked well, except for one function that uses "toDS" and the ML
libraries.
This stage used to complete in 15 minutes or so on 1.6.2, and now takes
almost two hours.
The UI shows very strange behavior - comple
These types of questions would be better asked on the user mailing list for
the Spark Cassandra connector:
http://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user
Version compatibility can be found here:
https://github.com/datastax/spark-cassandra-connector#version-compa
please tell me the configuration including the most recent version of
cassandra, spark and cassandra spark connector
ature of data frame which is available since
Spark 1.6. But the spark of current cluster is version 1.5. Can we install
Spark 2.0 on the master node to work around this?
Thanks!
--
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
--
Cell : 425-233-8271
Twitter: https://twitter
park between the master
>> and worker nodes, if your going to upgrade - upgrade all of them. Otherwise
>> you may get very confusing failures.
>>
>> On Monday, September 5, 2016, Rex X wrote:
>>
>>> Wish to use the Pivot Table feature of data frame which is availabl
> On Monday, September 5, 2016, Rex X > wrote:
>
>> Wish to use the Pivot Table feature of data frame which is available
>> since Spark 1.6. But the spark of current cluster is version 1.5. Can we
>> install Spark 2.0 on the master node to work around this?
>>
>
6. But the spark of current cluster is version 1.5. Can we install
Spark 2.0 on the master node to work around this?
Thanks!
--
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
;
>> >> >
>> >> > org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
>> >> >
>> >> > 16/09/07 16:00:02 INFO CoarseGrainedExecutorBackend: Got assigned
>> >> > task
>> >> &g
umer: Initial fetch for
> >> > spark-executor-StreamingPixelCount1 mt_event 0 57098866
> >> >
> >> > 16/09/07 16:00:03 INFO Executor: Finished task 1.1 in stage 138.0 (TID
> >> > 7854). 1103 bytes result sent to driver
> >> >
> >&
; > On Wed, Aug 24, 2016 at 2:13 PM, Srikanth wrote:
>> >>
>> >> Thanks Cody. Setting poll timeout helped.
>> >> Our network is fine but brokers are not fully provisioned in test
>> >> cluster.
>> >> But there isn't enough load to ma
at running on the same node doesn't have any issues.
> >>
> >> Srikanth
> >>
> >> On Tue, Aug 23, 2016 at 9:52 PM, Cody Koeninger
> >> wrote:
> >>>
> >>> You can set that poll timeout higher with
> >>>
> >&
Koeninger
>> wrote:
>>>
>>> You can set that poll timeout higher with
>>>
>>> spark.streaming.kafka.consumer.poll.ms
>>>
>>> but half a second is fairly generous. I'd try to take a look at
>>> what's going on with your n
a.consumer.poll.ms
>>
>> but half a second is fairly generous. I'd try to take a look at
>> what's going on with your network or kafka broker during that time.
>>
>> On Tue, Aug 23, 2016 at 4:44 PM, Srikanth wrote:
>> > Hello,
>> >
>&g
Hi everyone,
I'd test some algorithms with the Dataset API offered by Spark 2.0.0.
So I was wondering, *which is the best way for managing Dataset partitions?*
E.g. in the data reading phase, what I use to do is the following
*// RDD*
*// if I want to set a custom minimum number of partitions*
*v
35 seconds)
With -XX:-DontCompileHugeMethods:
30 rows selected (1.086 seconds)
30 rows selected (1.051 seconds)
30 rows selected (1.073 seconds)
>Среда, 7 сентября 2016, 0:35 +03:00 от Yong Zhang :
>
>This is an interesting point.
>
>I tested with originally data with Spark 2.0 re
This is an interesting point.
I tested with originally data with Spark 2.0 release, I can get the same
statistic output in the originally email like following:
50 1.77695393562
51 0.695149898529
52 0.638142108917
53 0.647341966629
54 0.663456916809
55 0.629166126251
56 0.644149065018
57
I think the slowness is caused by generated aggregate method has more
than 8K bytecodes, than it's not JIT compiled, became much slower.
Could you try to disable the DontCompileHugeMethods by:
-XX:-DontCompileHugeMethods
On Mon, Sep 5, 2016 at 4:21 AM, Сергей Романов
wrote:
> Hi, Gavin,
>
> Shu
Hi, Gavin,
Shuffling is exactly the same in both requests and is minimal. Both requests
produces one shuffle task. Running time is the only difference I can see in
metrics:
timeit.timeit(spark.read.csv('file:///data/dump/test_csv',
schema=schema).groupBy().sum(*(['dd_convs'] * 57) ).collect,
which is available since
> Spark 1.6. But the spark of current cluster is version 1.5. Can we install
> Spark 2.0 on the master node to work around this?
>
> Thanks!
>
--
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
Wish to use the Pivot Table feature of data frame which is available since
Spark 1.6. But the spark of current cluster is version 1.5. Can we install
Spark 2.0 on the master node to work around this?
Thanks!
Please run with -X and post logs here. We can get exact error from it.
On Sat, Sep 3, 2016 at 7:24 PM, Marco Mistroni wrote:
> hi all
>
> i am getting failures when building spark 2.0 on Ubuntu 16.06
> Here's details of what i have installed on the ubuntu host
> - j
Any shuffling?
> On Sep 3, 2016, at 5:50 AM, Сергей Романов wrote:
>
> Same problem happens with CSV data file, so it's not parquet-related either.
>
>
> Welcome to
> __
> / __/__ ___ _/ /__
> _\ \/ _ \/ _ `/ __/ '_/
>/__ / .__/\_,_/_/ /_/\_\ vers
achieved by extending HiveContext, and correspondingly
>> HiveCatalog. I have my own implementation of trait "Catalog", which
>> over-rides the "lookupRelation" method to do the magic behind the scenes.
>>
>> However, in spark 2.0, I can see foll
og. I have my own implementation of trait "Catalog", which
> over-rides the "lookupRelation" method to do the magic behind the scenes.
>
> However, in spark 2.0, I can see following -
> SessionCatalog - which contains lookupRelation method, but doesn't have
> any in
hi all
i am getting failures when building spark 2.0 on Ubuntu 16.06
Here's details of what i have installed on the ubuntu host
- java 8
- scala 2.11
- git
When i launch the command
./build/mvn -Pyarn -Phadoop-2.7 -DskipTests clean package
everything compiles sort of fine and at the
And even more simple case:
>>> df = sc.parallelize([1] for x in xrange(760857)).toDF()
>>> for x in range(50, 70): print x, timeit.timeit(df.groupBy().sum(*(['_1'] *
>>> x)).collect, number=1)
50 1.91226291656
51 1.50933384895
52 1.582903862
53 1.90537405014
54 1.84442877769
55 1.9177978
56
Same problem happens with CSV data file, so it's not parquet-related either.
Welcome to
__
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
SparkSessi
Hi,
I had narrowed down my problem to a very simple case. I'm sending 27kb parquet
in attachment. (file:///data/dump/test2 in example)
Please, can you take a look at it? Why there is performance drop after 57 sum
columns?
Welcome to
__
/ __/__ ___ _/ /__
_\
ame
4. Register it as temp table (for future calls to this table)
This is achieved by extending HiveContext, and correspondingly HiveCatalog.
I have my own implementation of trait "Catalog", which over-rides the
"lookupRelation" method to do the magic behind the scenes.
However
gt; 13.405s
>>
>>
>>>Четверг, 1 сентября 2016, 19:35 +03:00 от Mich Talebzadeh <
>>>mich.talebza...@gmail.com >:
>>>
>>>
>>>What happens if you run the following query on its own. How long it takes?
>>>
>>>SELECT field,
content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 1 September 2016 at 16:55, Сергей Романов
> wrote:
>
> Hi,
>
> When I run a query like "SELECT field, SUM(
SUM(x28), SUM(x29) FROM parquet_table WHERE partition = 1
>>GROUP BY field" it runs in about 12 seconds.
>>
>>Why does it happens? Can I make second query run as fast as first one? I
>>tried browsing logs in TRACE mode and comparing CODEGEN but everything looks
I'm not sure how the shepherd thing works, but just FYI Michael
Armbrust originally wrote Catalyst, the engine behind Datasets.
You can find a list of all committers here
https://cwiki.apache.org/confluence/display/SPARK/Committers. Another
good resource is to check https://spark-prs.appspot.com/
ark-shell cannot take value
> classes, that was an additional confusion to me!
>
> 2. We have a Spark 2.0 project which is definitely breaking at runtime
> with a Dataset of value classes. I am not sure if this is also the case in
> Spark 1.6, I'm going to verify.
>
> Onc
Thank you Jakob on two counts
1. Yes, thanks for pointing out that spark-shell cannot take value classes,
that was an additional confusion to me!
2. We have a Spark 2.0 project which is definitely breaking at runtime with
a Dataset of value classes. I am not sure if this is also the case in
the REPL. See
https://issues.apache.org/jira/browse/SPARK-17367)
regards,
--Jakob
On Thu, Sep 1, 2016 at 1:58 PM, Aris wrote:
> Hello Spark community -
>
> Does Spark 2.0 Datasets *not support* Scala Value classes (basically
> "extends AnyVal" with a bunch of limitations) ?
Hello Spark community -
Does Spark 2.0 Datasets *not support* Scala Value classes (basically
"extends AnyVal" with a bunch of limitations) ?
I am trying to do something like this:
case class FeatureId(value: Int) extends AnyVal
val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(
parquet_table WHERE
> partition = 1 GROUP BY field" it runs in about 12 seconds.
>
> Why does it happens? Can I make second query run as fast as first one? I
> tried browsing logs in TRACE mode and comparing CODEGEN but everything
> looks pretty much the same excluding execution ti
lated to SPARK-17115 ?
I'm using Spark 2.0 Thrift Server over YARN/HDFS with partitioned parquet hive
tables.
Complete example using beeline:
0: jdbc:hive2://spark-master1.uslicer> DESCRIBE EXTENDED
`slicer`.`573_slicer_rnd_13`;
col_name,data_type,comment
actual_dsp_fee,float,NUL
Can this be related to SPARK-17115 ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-SQL-runs-5x-times-slower-when-adding-29th-field-to-aggregation-tp27624p27643.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
hub.com/apache/spark/pull/14339
>
> On 1 Sep 2016 2:48 a.m., "Don Drake" wrote:
>
>> I am in the process of migrating a set of Spark 1.6.2 ETL jobs to Spark
>> 2.0 and have encountered some interesting issues.
>>
>> First, it seems the SQL parsing is differe
Hi Don, I guess this should be fixed from 2.0.1.
Please refer this PR. https://github.com/apache/spark/pull/14339
On 1 Sep 2016 2:48 a.m., "Don Drake" wrote:
> I am in the process of migrating a set of Spark 1.6.2 ETL jobs to Spark
> 2.0 and have encountered some interesting i
I am in the process of migrating a set of Spark 1.6.2 ETL jobs to Spark 2.0
and have encountered some interesting issues.
First, it seems the SQL parsing is different, and I had to rewrite some SQL
that was doing a mix of inner joins (using where syntax, not inner) and
outer joins to get the SQL
> scala> s"I'm using $spark in ${spark.version}"
> res0: String = I'm using org.apache.spark.sql.SparkSession@1fc1c7e in
> 2.1.0-SNAPSHOT
>
>
> Pozdrawiam,
> Jacek Laskowski
>
> https://medium.com/@jaceklaskowski/ <https://medium.com/@jaceklaskowski/>
> Master
.
scala> s"I'm using $spark in ${spark.version}"
res0: String = I'm using org.apache.spark.sql.SparkSession@1fc1c7e in
2.1.0-SNAPSHOT
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me
Actually I doubled checked this ‘s’ String Interpolator
In Scala
scala> val chars = "This is Scala"
chars: String = This is Scala
scala> println($"$chars")
This is Scala
OK so far fine. In shell (ksh) can do
chars="This is Scala"
print "$chars"
This is Scala
In Shell
print "$charsand it is i
Try this:
val df = spark.createDataFrame(Seq(Vectors.dense(Array(10, 590, 190,
700))).map(Tuple1.apply)).toDF("features")
On Sun, 28 Aug 2016 at 11:06 yaroslav wrote:
> Hi,
>
> We use such kind of logic for training our model
>
> val model = new LogisticRegressionWithLBFGS()
> .setNum
Hi,
We use such kind of logic for training our model
val model = new LogisticRegressionWithLBFGS()
.setNumClasses(3)
.run(train)
Next, during spark streaming, we load model and apply incoming data to this
model to get specific class, for example:
model.predict(Vectors.dense(1
Yes I realised that. Actually I thought it was s not $. it has been around
in shell for years say for actual values --> ${LOG_FILE}, for position 's/
etc
cat ${LOG_FILE} | egrep -v 'rows affected|return status|&&&' | sed -e
's/^[]*//g' -e 's/^//g' -e '/^$/d' > temp.out
Dr Mi
Hi Mich,
This is Scala's string interpolation which allow for replacing $-prefixed
expressions with their values.
It's what cool kids use in Scala to do templating and concatenation 😁
Jacek
On 23 Aug 2016 9:21 a.m., "Mich Talebzadeh"
wrote:
> What is --> s below before the text of sql?
>
>
s
>>>>>
>>>>>
>>>>> RUN ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests
>>>>> clean package
>>>>>
>>>>> kr
>>>>>
>>>>>
>>>>>
>>>>>
gt;
>>>> kr
>>>>
>>>>
>>>>
>>>> On Fri, Aug 26, 2016 at 6:18 PM, Michael Gummelt <
>>>> mgumm...@mesosphere.io> wrote:
>>>>
>>>>> :)
>>>>>
>>>>> On Thu, Aug 25, 2016 at
pe it helps
>
>
>
> If I join the data between the 2 DFs (based on Product# and item#), I will
> get a cartesion join and my result will not be what I want
>
>
>
> Thanks for your help
>
>
>
>
>
> *From:* Mike Metzger [mailto:m...@flexiblecreations.com]
> *Sen
ichael Gummelt >> > wrote:
>>>
>>>> :)
>>>>
>>>> On Thu, Aug 25, 2016 at 2:29 PM, Marco Mistroni
>>>> wrote:
>>>>
>>>>> No i wont accept that :)
>>>>> I can't believe i have wasted 3 h
>>>>
>>>> kr
>>>>
>>>> On Thu, Aug 25, 2016 at 10:01 PM, Michael Gummelt <
>>>> mgumm...@mesosphere.io> wrote:
>>>>
>>>>> You have a space between "build" and "mvn"
>>>>>
I want
Thanks for your help
From: Mike Metzger [mailto:m...@flexiblecreations.com]
Sent: Friday, August 26, 2016 2:12 PM
To: Subhajit Purkayastha
Cc: user @spark
Subject: Re: Spark 2.0 - Insert/Update to a DataFrame
Without seeing exactly what you were wanting to accomplish, it
gt;>> mgumm...@mesosphere.io> wrote:
>>>
>>>> You have a space between "build" and "mvn"
>>>>
>>>> On Thu, Aug 25, 2016 at 1:31 PM, Marco Mistroni
>>>> wrote:
>>>>
>>>>> HI all
>&
ll
>>>> sorry for the partially off-topic, i hope there's someone on the list
>>>> who has tried the same and encountered similar issuse
>>>>
>>>> Ok so i have created a Docker file to build an ubuntu container which
>>>> inlcudes spar
on the
> sales qty (coming from the sales order DF)
>
>
>
> Hope it helps
>
>
>
> Subhajit
>
>
>
> *From:* Mike Metzger [mailto:m...@flexiblecreations.com]
> *Sent:* Friday, August 26, 2016 1:13 PM
> *To:* Subhajit Purkayastha
> *Cc:* user @spark
>
:13 PM
To: Subhajit Purkayastha
Cc: user @spark
Subject: Re: Spark 2.0 - Insert/Update to a DataFrame
Without seeing the makeup of the Dataframes nor what your logic is for updating
them, I'd suggest doing a join of the Forecast DF with the appropriate columns
from the SalesOrd
Without seeing the makeup of the Dataframes nor what your logic is for
updating them, I'd suggest doing a join of the Forecast DF with the
appropriate columns from the SalesOrder DF.
Mike
On Fri, Aug 26, 2016 at 11:53 AM, Subhajit Purkayastha
wrote:
> I am using spark 2.0, have 2 Da
Thank you! That was it. 2.0 installed fine after the update.
Regards
> On Aug 26, 2016, at 1:37 PM, Noorul Islam K M wrote:
>
> kalkimann writes:
>
>> Hi,
>> spark 1.6.2 is the latest brew package i can find.
>> spark 2.0.x brew package is missing, best i kn
kalkimann writes:
> Hi,
> spark 1.6.2 is the latest brew package i can find.
> spark 2.0.x brew package is missing, best i know.
>
> Is there a schedule when spark-2.0 will be available for "brew install"?
>
Did you do a 'brew update' before searching. I i
Hi,
spark 1.6.2 is the latest brew package i can find.
spark 2.0.x brew package is missing, best i know.
Is there a schedule when spark-2.0 will be available for "brew install"?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-2-0
ce between "build" and "mvn"
>>
>> On Thu, Aug 25, 2016 at 1:31 PM, Marco Mistroni
>> wrote:
>>
>>> HI all
>>> sorry for the partially off-topic, i hope there's someone on the list
>>> who has tried the same and encountere
I am using spark 2.0, have 2 DataFrames, SalesOrder and Forecast. I need to
update the Forecast Dataframe record(s), based on the SaleOrder DF record.
What is the best way to achieve this functionality
wrote:
>
>> HI all
>> sorry for the partially off-topic, i hope there's someone on the list
>> who has tried the same and encountered similar issuse
>>
>> Ok so i have created a Docker file to build an ubuntu container which
>> inlcudes spark 2.0, but s
created a Docker file to build an ubuntu container which
> inlcudes spark 2.0, but somehow when it gets to the point where it has to
> kick off ./build/mvn command, it errors out with the following
>
> ---> Running in 8c2aa6d59842
> /bin/sh: 1: ./build: Permission denied
HI all
sorry for the partially off-topic, i hope there's someone on the list who
has tried the same and encountered similar issuse
Ok so i have created a Docker file to build an ubuntu container which
inlcudes spark 2.0, but somehow when it gets to the point where it has to
kick off ./buil
Hi there
I am performing a product recommendation system for retail. I have been
able to compute the TF-IDF of user-items data frame in spark 2.0.
Now I need to transform the TF-IDF output in a data frame with columns
(user_id, item_id, TF_IDF_ratings) in order to perform an ALS. But I have
no
ue, Aug 23, 2016 at 4:44 PM, Srikanth wrote:
> > Hello,
> >
> > I'm getting the below exception when testing Spark 2.0 with Kafka 0.10.
> >
> >> 16/08/23 16:31:01 INFO AppInfoParser: Kafka version : 0.10.0.0
> >> 16/08/23 16:31:01 INFO AppInfoParser: Kafka
l,
>
> I am running hadoop 2.6.4 with Spark 2.0 and I have been trying to get
> dynamic allocation to work without success. I was able to get it to work
> with Spark 16.1 however.
>
> When I issue the command
> spark-shell --master yarn --deploy-mode client
>
> this is
Hello all,
I am running hadoop 2.6.4 with Spark 2.0 and I have been trying to get dynamic
allocation to work without success. I was able to get it to work with Spark
16.1 however.
When I issue the commandspark-shell --master yarn --deploy-mode client
this is the error I see:
16/08/24 00:05:40
gt; I'm getting the below exception when testing Spark 2.0 with Kafka 0.10.
>
>> 16/08/23 16:31:01 INFO AppInfoParser: Kafka version : 0.10.0.0
>> 16/08/23 16:31:01 INFO AppInfoParser: Kafka commitId : b8642491e78c5a13
>> 16/08/23 16:31:01 INFO CachedKafkaConsumer: Initial f
Hello,
I'm getting the below exception when testing Spark 2.0 with Kafka 0.10.
16/08/23 16:31:01 INFO AppInfoParser: Kafka version : 0.10.0.0
> 16/08/23 16:31:01 INFO AppInfoParser: Kafka commitId : b8642491e78c5a13
> 16/08/23 16:31:01 INFO CachedKafkaConsumer: Initial fetch for
>
What is --> s below before the text of sql?
*var* sales_order_sql_stmt =* s*"""SELECT ORDER_NUMBER , INVENTORY_ITEM_ID,
ORGANIZATION_ID,
from_unixtime(unix_timestamp(SCHEDULE_SHIP_DATE,'-MM-dd'),
'-MM-dd') AS schedule_date
FROM sales_order_demand
WHERE unix_times
On Tue, Aug 23, 2016 at 10:32 AM, Deepak Sharma
wrote:
> *val* *df** =
> **sales_demand**.**join**(**product_master**,**sales_demand**.$"INVENTORY_ITEM_ID"
> =**== **product_master**.$"INVENTORY_ITEM_ID",**"inner"**)*
Ignore the last statement.
It should look something like this:
*val* *df** =
Hi Subhajit
Try this in your join:
*val* *df** =
**sales_demand**.**join**(**product_master**,**sales_demand**.$"INVENTORY_ITEM_ID"
=**== **product_master**.$"INVENTORY_ITEM_ID",**"inner"**)*
On Tue, Aug 23, 2016 at 2:30 AM, Subhajit Purkayastha
wrote:
> *All,*
>
>
>
> *I have the following dat
try putting join condition as String
On Mon, Aug 22, 2016 at 5:00 PM, Subhajit Purkayastha
wrote:
> *All,*
>
>
>
> *I have the following dataFrames and the temp table. *
>
>
>
> *I am trying to create a new DF , the following statement is not compiling*
>
>
>
> *val* *df** = **sales_demand**.**j
All,
I have the following dataFrames and the temp table.
I am trying to create a new DF , the following statement is not compiling
val df =
sales_demand.join(product_master,(sales_demand.INVENTORY_ITEM_ID==product_ma
ster.INVENTORY_ITEM_ID),joinType="inner")
What am I do
3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27571.html
> To unsubscribe from Spark 2.0 regression when querying very wide data frames,
> click here.
> NAML
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2
I generated CSV file with 300 columns, and it seems to work fine with Spark
Dataframes(Spark 2.0).
I think you need to post your issue in spark-cassandra-connector community
(https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user)
- if you are using it.
--
View this
Did you try to load wide, for example, CSV file or Parquet? May be the
problem is in spark-cassandra-connector not Spark itself? Are you using
spark-cassandra-connector(https://github.com/datastax/spark-cassandra-connector)?
--
View this message in context:
http://apache-spark-user-list.10015
-spark-user-list.1001560.n3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27569.html
> To unsubscribe from Spark 2.0 regression when querying very wide data frames,
> click here.
> NAML
--
View this message in context:
http://apache-spark-user-list.100156
Hi,
What kind of datasource do you have? CSV, Avro, Parquet?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27569.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Yes, have a look through JIRA in cases like this.
https://issues.apache.org/jira/browse/SPARK-16664
On Sat, Aug 20, 2016 at 1:57 AM, mhornbech wrote:
> I did some extra digging. Running the query "select column1 from myTable" I
> can reproduce the problem on a frame with a single row - it occurs
I did some extra digging. Running the query "select column1 from myTable" I
can reproduce the problem on a frame with a single row - it occurs exactly
when the frame has more than 200 columns, which smells a bit like a
hardcoded limit.
Interestingly the problem disappears when replacing the query
. Needless to
say that 1500+ columns isn't "desirable", but that's what the client's data
looks like and our preference have been to load it and normalize it through
Spark.
We have been waiting to see how this would work with Spark 2.0, and
unfortunately the problem has
101 - 200 of 504 matches
Mail list logo