unsubscribe

2022-08-12 Thread Alexey Milogradov

unsubscribe

2020-02-19 Thread Alexey Kovyazin
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark jdbc postgres numeric array

2019-01-05 Thread Alexey
Hi, I also filed a jira yesterday: https://issues.apache.org/jira/browse/SPARK-26538 Looks like one needs to be closed as duplicate. Sorry for the late update. Best regards -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Spark jdbc postgres numeric array

2018-12-31 Thread Alexey
to get array elements of type decimal(38,18) and no error when reading in this case. Should this be considered a bug? Is there a workaround other than changing the column array type definition to include explicit precision and scale? Best regards, Alexey -- реклама

mapWithState() without data checkpointing

2016-09-29 Thread Alexey Kharlamov
Hello! I would like to avoid data checkpointing when processing a DStream. Basically, we do not care if the intermediate data are lost. Is there a way to achieve that? Is there an extension point or class embedding all associated activities? Thanks! Sincerely yours, — Alexey Kharlamov

Re: VectorUDT with spark.ml.linalg.Vector

2016-08-16 Thread Alexey Svyatkovskiy
Hi Yanbo, Thanks for your reply. I will keep an eye on that pull request. For now, I decided to just put my code inside org.apache.spark.ml to be able to access private classes. Thanks, Alexey On Tue, Aug 16, 2016 at 11:13 PM, Yanbo Liang <yblia...@gmail.com> wrote: > It seams that

Re: Ideas to put a Spark ML model in production

2016-07-03 Thread Alexey Pechorin
>From my personal experience - we're reading the metadata of the features column in the dataframe to extract mapping of the feature indices to the original feature name, and use this mapping to translate the model coefficients into a JSON string that maps the original feature names to their

Re: cache datframe

2016-06-16 Thread Alexey Pechorin
What's the reason for your first cache call? It looks like you've used the data only once to transform it without reusing the data, so there's no reason for the first cache call, and you need only the second call (and that also depends on the rest of your code). On Thu, Jun 16, 2016 at 3:17 PM,

sliding Top N window

2016-03-11 Thread Yakubovich, Alexey
when view expires the legth of sliding window…. So my question: does anybody know/have and can share the piece code/ know how: how to implement “sliding Top N window” better. If nothing will be offered, I will share what I will do myself. Thank you Alexey This message, including any attachments

[streaming] reading Kafka direct stream throws kafka.common.OffsetOutOfRangeException

2015-09-30 Thread Alexey Ponkin
Hi I have simple spark-streaming job(8 executors 1 core - on 8 node cluster) - read from Kafka topic( 3 brokers with 8 partitions) and save to Cassandra. The problem is that when I increase number of incoming messages in topic the job is starting to fail with

Re: [streaming] DStream with window performance issue

2015-09-08 Thread Alexey Ponkin
Koeninger" <c...@koeninger.org>: >  Can you provide more info (what version of spark, code example)? > >  On Tue, Sep 8, 2015 at 8:18 AM, Alexey Ponkin <alexey.pon...@ya.ru> wrote: >>  Hi, >> >>  I have an application with 2 streams, which are joined together.

[streaming] DStream with window performance issue

2015-09-08 Thread Alexey Ponkin
Hi, I have an application with 2 streams, which are joined together. Stream1 - is simple DStream(relativly small size batch chunks) Stream2 - is a windowed DStream(with duration for example 60 seconds) Stream1 and Stream2 are Kafka direct stream. The problem is that according to logs window

[streaming] Using org.apache.spark.Logging will silently break task execution

2015-09-06 Thread Alexey Ponkin
Hi, I have the following code object MyJob extends org.apache.spark.Logging{ ... val source: DStream[SomeType] ... source.foreachRDD { rdd => logInfo(s"""+++ForEachRDD+++""") rdd.foreachPartition { partitionOfRecords => logInfo(s"""+++ForEachPartition+++""") } } I

Re: Getting number of physical machines in Spark

2015-08-28 Thread Alexey Grishchenko
writing a Spark streaming application to ingest from Kafka with the Receiver API and want to create one DStream per physical machine for read parallelism’s sake. How can I figure out at run time how many machines there are so I know how many DStreams to create? -- Best regards, Alexey

Re: Help Explain Tasks in WebUI:4040

2015-08-28 Thread Alexey Grishchenko
what is going on? Thanks, - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Alexey Grishchenko, http://0x0fff.com

Re: Any quick method to sample rdd based on one filed?

2015-08-28 Thread Alexey Grishchenko
. And hope in the final result, the negative ones could be 10 times more than positive ones. What would be most efficient way to do this? Thanks, -- Best regards, Alexey Grishchenko phone: +353 (87) 262-2154 email: programme...@gmail.com web: http://0x0fff.com

Re: Calculating Min and Max Values using Spark Transformations?

2015-08-28 Thread Alexey Grishchenko
-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best regards, Alexey Grishchenko phone: +353 (87) 262-2154 email: programme...@gmail.com web: http://0x0fff.com

Unsupported major.minor version 51.0

2015-08-11 Thread Yakubovich, Alexey
java.lang.UnsupportedClassVersionError: org/apache/maven/cli/MavenCli : Unsupported major.minor version 51.0 Please help how to build the thing. Thanks Alexey This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries

Can't build Spark 1.3

2015-06-02 Thread Yakubovich, Alexey
clean package Is it only me, who can’t build Spark 1.3? And, is there any site to download Spark prebuilt for Hadoop 2.5 and Hive? Thank you for any help. Alexey This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential

Re: Is it possible to use json4s 3.2.11 with Spark 1.3.0?

2015-03-29 Thread Alexey Zinoviev
somehow. Can you double check that and remove the Scala classes from your app if they're there? On Mon, Mar 23, 2015 at 10:07 PM, Alexey Zinoviev alexey.zinov...@gmail.com wrote: Thanks Marcelo, this options solved the problem (I'm using 1.3.0), but it works only if I remove extends Logging from

Re: Is it possible to use json4s 3.2.11 with Spark 1.3.0?

2015-03-23 Thread Alexey Zinoviev
version3.2.10/version /dependency The version is hard coded. You can rebuild Spark 1.3.0 with json4s 3.2.11 Cheers On Mon, Mar 23, 2015 at 2:12 PM, Alexey Zinoviev alexey.zinov...@gmail.com wrote: Spark has a dependency on json4s 3.2.10, but this version has several bugs and I need

Re: Is it possible to use json4s 3.2.11 with Spark 1.3.0?

2015-03-23 Thread Alexey Zinoviev
it with spark-1.3.0/bin/spark-submit --class App1 --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true $HOME/projects/sparkapp/target/scala-2.10/sparkapp-assembly-1.0.jar Thanks, Alexey On Tue, Mar 24, 2015 at 5:03 AM, Marcelo Vanzin van...@cloudera.com wrote: You

Is it possible to use json4s 3.2.11 with Spark 1.3.0?

2015-03-23 Thread Alexey Zinoviev
usage? Thanks, Alexey

Re: Mathematical functions in spark sql

2015-01-26 Thread Alexey Romanchuk
I have tried select ceil(2/3), but got key not found: floor On Tue, Jan 27, 2015 at 11:05 AM, Ted Yu yuzhih...@gmail.com wrote: Have you tried floor() or ceil() functions ? According to http://spark.apache.org/sql/, Spark SQL is compatible with Hive SQL. Cheers On Mon, Jan 26, 2015 at

Re: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down

2014-12-02 Thread Alexey Romanchuk
Any ideas? Anyone got the same error? On Mon, Dec 1, 2014 at 2:37 PM, Alexey Romanchuk alexey.romanc...@gmail.com wrote: Hello spark users! I found lots of strange messages in driver log. Here it is: 2014-12-01 11:54:23,849 [sparkDriver-akka.actor.default-dispatcher-25] ERROR

akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down

2014-12-01 Thread Alexey Romanchuk
Hello spark users! I found lots of strange messages in driver log. Here it is: 2014-12-01 11:54:23,849 [sparkDriver-akka.actor.default-dispatcher-25] ERROR

Delayed hotspot optimizations in Spark

2014-10-10 Thread Alexey Romanchuk
Hello spark users and developers! I am using hdfs + spark sql + hive schema + parquet as storage format. I have lot of parquet files - one files fits one hdfs block for one day. The strange thing is very slow first query for spark sql. To reproduce situation I use only one core and I have 97sec

Re: Delayed hotspot optimizations in Spark

2014-10-10 Thread Alexey Romanchuk
the upfront compilation really helps. I doubt it. However is this almost surely due to caching somewhere, in Spark SQL or HDFS? I really doubt hotspot makes a difference compared to these much larger factors. On Fri, Oct 10, 2014 at 8:49 AM, Alexey Romanchuk alexey.romanc...@gmail.com wrote

Re: Log hdfs blocks sending

2014-09-26 Thread Alexey Romanchuk
- https://gist.github.com/13h3r/6e5053cf0dbe33f2 Do you have any idea where to look at? Thanks! On Fri, Sep 26, 2014 at 10:35 AM, Andrew Ash and...@andrewash.com wrote: Hi Alexey, You should see in the logs a locality measure like NODE_LOCAL, PROCESS_LOCAL, ANY, etc. If your Spark workers

Log hdfs blocks sending

2014-09-25 Thread Alexey Romanchuk
Hello again spark users and developers! I have standalone spark cluster (1.1.0) and spark sql running on it. My cluster consists of 4 datanodes and replication factor of files is 3. I use thrift server to access spark sql and have 1 table with 30+ partitions. When I run query on whole table