-dev +user
Than warning pretty much means what it says - the consumer tried to
get records for the given partition / offset, and couldn't do so after
polling the kafka broker for X amount of time.
If that only happens when you put additional load on Kafka via
producing, the first thing I'd do is
Instead of *mode="append"*, try *mode="overwrite"*
On Tue, Sep 20, 2016 at 11:30 AM, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:
> Please find the code below.
>
> sankar2 <- read.df("/nfspartition/sankar/test/2016/08/test.json")
>
> I tried these two commands.
>
You can probably just do an identity transformation on the column to
make its type a nullable String array -- ArrayType(StringType, true).
Of course, I'm not sure why Word2Vec must reject a non-null array type
when it can of course handle nullable, but the previous discussion
indicated that this
Hi Frank,
Which version of Spark are you using? Also can you share more information about
the exception.
If it’s not confidential, you can send the data sample to me
(yuhao.y...@intel.com) and I can try to investigate.
Regards,
Yuhao
From: Frank Zhang [mailto:dataminin...@yahoo.com.INVALID]
I used that one also
On Sep 20, 2016 10:44 PM, "Kevin Mellott" wrote:
> Instead of *mode="append"*, try *mode="overwrite"*
>
> On Tue, Sep 20, 2016 at 11:30 AM, Sankar Mittapally creditvidya.com> wrote:
>
>> Please find the code below.
>>
>>
Hi Yan, I agree, it IS really confusing. Here is the technique for
transforming a column. It is very general because you can make "myConvert"
do whatever you want.
import org.apache.spark.mllib.linalg.Vectors
val df = Seq((0, "[1,3,5]"), (1, "[2,4,6]")).toDF
df.show()
// The columns were named
Hi,
I am trying to create an external table WITH partitions using SPARK.
Currently I am using catalog.createExternalTable(cleanTableName, "ORC",
schema, Map("path" -> s"s3://$location/"))
Does createExternalTable have options that can create a table with
partitions? I assume it would be a
Can you please post the line of code that is doing the df.write command?
On Tue, Sep 20, 2016 at 9:29 AM, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:
> Hey Kevin,
>
> It is a empty directory, It is able to write part files to the directory
> but while merging those part files
Please find the code below.
sankar2 <- read.df("/nfspartition/sankar/test/2016/08/test.json")
I tried these two commands.
write.df(sankar2,"/nfspartition/sankar/test/test.csv","csv",header="true")
saveDF(sankar2,"sankartest.csv",source="csv",mode="append",schema="true")
On Tue, Sep 20, 2016
Hi,
I've been struggling with the reliability, which on the face of it, should
be a fairly simple job: Given a number of events, group them by user, event
type and week in which they occurred and aggregate their counts.
The input data is fairly skewed so I do a repartition and add a salt to the
can we connect to cassandra from spark using spark-cassandra-connector which
all three are built on the same computer? what kind of problems this
configuration leads to?
Hi All
I am trying to test a simple Spark APP using scala.
import org.apache.spark.SparkContext
object SparkDemo {
def main(args: Array[String]) {
val logFile = "README.md" // Should be some file on your system
// to run in local mode
val sc = new SparkContext("local", "Simple
How many products do you have? How large are your vectors?
It could be that SVD / LSA could be helpful. But if you have many products
then trying to compute all-pair similarity with brute force is not going to
be scalable. In this case you may want to investigate hashing (LSH)
techniques.
On
Dear All,
My code works fine with JSON input data. When I tried the Parquet data
format, it worked for English data. For Japanese text, I am getting the below
stack-trace. Pls help!
Caused by: java.lang.ClassCastException: optional binary element (UTF8) is not
a group
at
(cc'ing dev list also)
I think a more general version of ranking metrics that allows arbitrary
relevance scores could be useful. Ranking metrics are applicable to other
settings like search or other learning-to-rank use cases, so it should be a
little more generic than pure recommender settings.
Hi, all:
I'm using spark sql thrift server under Spark1.3.1 to do hive sql query. I
started spark sql thrift server like ./sbin/start-thriftserver.sh --master
yarn-client --num-executors 12 --executor-memory 5g --driver-memory 5g, then
sent continuos hive sql to the thrift server.
These types of questions would be better asked on the user mailing list for
the Spark Cassandra connector:
http://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user
Version compatibility can be found here:
I'm using Spark 2.0.
I've created a dataset from a parquet file and repartition on one of the
columns (docId) and persist the repartitioned dataset.
val om = ds.repartition($"docId").persist(StorageLevel.MEMORY_AND_DISK)
When I try to confirm the partitioner, with
om.rdd.partitioner
I get
Are you able to manually delete the folder below? I'm wondering if there is
some sort of non-Spark factor involved (permissions, etc).
/nfspartition/sankar/banking_l1_v2.csv
On Tue, Sep 20, 2016 at 12:19 PM, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:
> I used that one also
>
A few options include:
https://github.com/marufaytekin/lsh-spark - I've used this a bit and it
seems quite scalable too from what I've looked at.
https://github.com/soundcloud/cosine-lsh-join-spark - not used this but
looks like it should do exactly what you need.
Hi Yuhao,
Thank you so much for your great contribution to the LDA and other Spark
modules!
I use both Spark 1.6.2 and 2.0.0. The data I used originally is very large
which has tens of millions of documents. But for test purpose, the data set I
mentioned earlier
Probably this: https://issues.apache.org/jira/browse/SPARK-13258
As described in the JIRA, workaround is to use SPARK_JAVA_OPTS
On Mon, Sep 19, 2016 at 5:07 PM, sagarcasual .
wrote:
> Hello,
> I have my Spark application running in cluster mode in CDH with
>
Thanks Nick - those examples will help a ton!!
On Tue, Sep 20, 2016 at 12:20 PM, Nick Pentreath
wrote:
> A few options include:
>
> https://github.com/marufaytekin/lsh-spark - I've used this a bit and it
> seems quite scalable too from what I've looked at.
>
Related question: is there anything that does scalable matrix
multiplication on Spark? For example, we have that long list of vectors
and want to construct the similarity matrix: v * T(v). In R it would be: v
%*% t(v)
Thanks,
Pete
On Mon, Sep 19, 2016 at 3:49 PM, Kevin Mellott
Using the Soundcloud implementation of LSH, I was able to process a 22K
product dataset in a mere 65 seconds! Thanks so much for the help!
On Tue, Sep 20, 2016 at 1:15 PM, Kevin Mellott
wrote:
> Thanks Nick - those examples will help a ton!!
>
> On Tue, Sep 20, 2016
Yeah I can do all operations on that folder
On Sep 21, 2016 12:15 AM, "Kevin Mellott" wrote:
> Are you able to manually delete the folder below? I'm wondering if there
> is some sort of non-Spark factor involved (permissions, etc).
>
>
Spark version plz ?
On 21 September 2016 at 09:46, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:
> Yeah I can do all operations on that folder
>
> On Sep 21, 2016 12:15 AM, "Kevin Mellott"
> wrote:
>
>> Are you able to manually delete the folder below?
Hello,
Please add a link in Spark Community page (
https://spark.apache.org/community.html)
To Israel Spark Meetup (https://www.meetup.com/israel-spark-users/)
We're an active meetup group, unifying the local Spark user community, and
having regular meetups.
Thanks!
Romi K.
My code works fine with JSON input format (Spark 1.6 on Amazon EMR,
emr-5.0.0). I tried the Parquet format. Works fine for English data. When I
tried the Parquet format with some Japanese language text, I am getting this
weird stack-trace:
*Caused by: java.lang.ClassCastException: optional binary
Enrich the RDDs first with more information and then map it to some case
class , if you are using scala.
You can then use play api's
(play.api.libs.json.Writes/play.api.libs.json.Json) classes to convert the
mapped case class to json.
Thanks
Deepak
On Tue, Sep 20, 2016 at 6:42 PM, sujeet jog
Hi,
I have a Rdd of n rows, i want to transform this to a Json RDD, and also
add some more information , any idea how to accomplish this .
ex : -
i have rdd with n rows with data like below , ,
16.9527493170273,20.1989561393151,15.7065424947394
Ah, I think that this was supposed to be changed with SPARK-9062. Let
me see about reopening 10835 and addressing it.
On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty
wrote:
> Is this a bug?
>
> On Sep 19, 2016 10:10 PM, "janardhan shetty" wrote:
Thanks Sean.
On Sep 20, 2016 7:45 AM, "Sean Owen" wrote:
> Ah, I think that this was supposed to be changed with SPARK-9062. Let
> me see about reopening 10835 and addressing it.
>
> On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty
> wrote:
> > Is
Have you checked to see if any files already exist at /nfspartition/sankar/
banking_l1_v2.csv? If so, you will need to delete them before attempting to
save your DataFrame to that location. Alternatively, you may be able to
specify the "mode" setting of the df.write operation to "overwrite",
Hey Kevin,
It is a empty directory, It is able to write part files to the directory
but while merging those part files we are getting above error.
Regards
On Tue, Sep 20, 2016 at 7:46 PM, Kevin Mellott
wrote:
> Have you checked to see if any files already exist at
Is this a bug?
On Sep 19, 2016 10:10 PM, "janardhan shetty" wrote:
> Hi,
>
> I am hitting this issue. https://issues.apache.org/jira/browse/SPARK-10835
> .
>
> Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is
> appreciated ?
>
> Note:
> Pipeline has
36 matches
Mail list logo