RE: Spark application fail wit numRecords error
Hi, I checked the following threads but i am still not sure if it is misuse, common o a bug. https://stackoverflow.com/questions/34989539/spark-streaming-from-kafka-has-error-numrecords-must-not-be-negative https://stackoverflow.com/questions/41319530/why-does-spark-streaming-application-with-kafka-fail-with-requirement-failed-n https://forums.databricks.com/questions/11055/how-to-resolve-illegalargumentexception-requiremen.html From: Prem Sure [mailto:sparksure...@gmail.com] Sent: Wednesday, November 1, 2017 8:11 PM To: Serkan TASCc: user@spark.apache.org Subject: Re: Spark application fail wit numRecords error Hi, any offset left over for new topic consumption?, case can be the offset is beyond current latest offset and cuasing negative. hoping kafka brokers health is good and are up, this can also be a reason sometimes. On Wed, Nov 1, 2017 at 11:40 AM, Serkan TAS > wrote: Hi, I searched the error in kafka but i think at last, it is related with spark not kafka. Has anyone faced to an exception that is terminating program with error "numRecords must not be negative" while streaming ? Thanx in advance. Regards. Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi çoğaltmak ve dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu derhal gönderene telefonla ya da e-posta ile bildirmeli ve bilgisayarından silmelidir. Bu iletinin içeriğinden yalnızca iletiyi gönderen kişi sorumludur. This communication may contain information that is legally privileged, confidential or exempt from disclosure. If you are not the intended recipient, please note that any dissemination, distribution, or copying of this communication is strictly prohibited. Anyone who receives this message in error should notify the sender immediately by telephone or by return communication and delete it from his or her computer. Only the person who has sent this message is responsible for its content. Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi çoğaltmak ve dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu derhal gönderene telefonla ya da e-posta ile bildirmeli ve bilgisayarından silmelidir. Bu iletinin içeriğinden yalnızca iletiyi gönderen kişi sorumludur. This communication may contain information that is legally privileged, confidential or exempt from disclosure. If you are not the intended recipient, please note that any dissemination, distribution, or copying of this communication is strictly prohibited. Anyone who receives this message in error should notify the sender immediately by telephone or by return communication and delete it from his or her computer. Only the person who has sent this message is responsible for its content.
Re: Fwd: Dose pyspark supports python3.6?
I'm not sure whether pyspark supports python 3.6 but pyspark and python 3.6 is working on my environment. I found the following issue and it seems to be already resolved. https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19019 2017/11/02 午前11:54 "Jun Shi": Dear spark developers: It’s so exciting to send this email to you. I have encountered the problem that if pyspark supports python3.6? (I found some answer online is no.) Can you tell me the answer which python versions does pyspark support? I’m looking forward for your answer. Thank you very much! Best, Jun
RE: Dose pyspark supports python3.6?
Dear Spark users I have been asked to provide a presentation / business case as to why to use spark and java as ingestion tool for HDFS and HIVE And why to move away from an etl tool. Could you be so kind as to provide with some pros and cons to this. I have the following : Pros: In house build – code can be changes on the fly to suite business need. Software is free Can out of the box run on all nodes Will support all Apache based software. Fast deu to in memory processing Spark UI can visualise execution Support checkpoint data loads Support echama regesty for custom schema and inference. Support Yarn execution Mlibs can be used in need. Data linage support deu to spar usage. Cons Skills needed to maintain and build In memory cabibility can become bottleneck if not managed No ETL gui. Maybe point be to an article if you have one. Thanks a mill. Christian Standard Bank email disclaimer and confidentiality note Please go to www.standardbank.co.za/site/homepage/emaildisclaimer.html to read our email disclaimer and confidentiality note. Kindly email disclai...@standardbank.co.za (no content or subject line necessary) if you cannot view that page and we will email our email disclaimer and confidentiality note to you.
Fwd: Dose pyspark supports python3.6?
Dear spark developers: It’s so exciting to send this email to you. I have encountered the problem that if pyspark supports python3.6? (I found some answer online is no.) Can you tell me the answer which python versions does pyspark support? I’m looking forward for your answer. Thank you very much! Best, Jun
Re: Writing custom Structured Streaming receiver
Structured Streaming source APIs are not yet public, so there isnt a guide. However, if you are adventurous enough, you can take a look at the source code in Spark. Source API: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala Text socket source implementation: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala Note that these APIs are still internal APIs and are very likely to change in future versions of Spark. On Wed, Nov 1, 2017 at 5:45 PM, Daniel Havivwrote: > Hi, > Is there a guide to writing a custom Structured Streaming receiver? > > Thank you. > Daniel >
Writing custom Structured Streaming receiver
Hi, Is there a guide to writing a custom Structured Streaming receiver? Thank you. Daniel
Re: Spark application fail wit numRecords error
Hi, any offset left over for new topic consumption?, case can be the offset is beyond current latest offset and cuasing negative. hoping kafka brokers health is good and are up, this can also be a reason sometimes. On Wed, Nov 1, 2017 at 11:40 AM, Serkan TASwrote: > Hi, > > > > I searched the error in kafka but i think at last, it is related with > spark not kafka. > > > > Has anyone faced to an exception that is terminating program with error > "numRecords must not be negative" while streaming ? > > > > Thanx in advance. > > > > Regards. > > -- > > Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler > içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi > çoğaltmak ve dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu > derhal gönderene telefonla ya da e-posta ile bildirmeli ve bilgisayarından > silmelidir. Bu iletinin içeriğinden yalnızca iletiyi gönderen kişi > sorumludur. > > This communication may contain information that is legally privileged, > confidential or exempt from disclosure. If you are not the intended > recipient, please note that any dissemination, distribution, or copying of > this communication is strictly prohibited. Anyone who receives this message > in error should notify the sender immediately by telephone or by return > communication and delete it from his or her computer. Only the person who > has sent this message is responsible for its content. >
Announcing Spark on Kubernetes release 0.5.0
The Spark on Kubernetes development community is pleased to announce release 0.5.0 of Apache Spark with Kubernetes as a native scheduler back-end! This release includes a few bug fixes and the following features: - Spark R support - Kubernetes 1.8 support - Mounts emptyDir volumes for temporary directories on executors in static allocation mode The full release notes are available here: https://github.com/apache-spark-on-k8s/spark/releases/ tag/v2.2.0-kubernetes-0.5.0 Community resources for Spark on Kubernetes are available at: - Slack: https://kubernetes.slack.com - User Docs: https://apache-spark-on-k8s.github.io/userdocs/ - GitHub: https://github.com/apache-spark-on-k8s/spark
Re: Read parquet files as buckets
Hi, What about the DAG can you send that as well? From the resulting "write" call? On Wed, Nov 1, 2017 at 5:44 AM, אורן שמוןwrote: > The version is 2.2.0 . > The code for the write is : > sortedApiRequestLogsDataSet.write > .bucketBy(numberOfBuckets, "userId") > .mode(SaveMode.Overwrite) > .format("parquet") > .option("path", outputPath + "/") > .option("compression", "snappy") > .saveAsTable("sorted_api_logs") > > And code for the read : > val df = sparkSession.read.parquet(path).toDF() > > The read code run on other cluster than the write . > > > > > On Tue, Oct 31, 2017 at 7:02 PM Michael Artz > wrote: > >> What version of spark? Do you have code sample? Screen shot of the DAG >> or the printout from .explain? >> >> On Tue, Oct 31, 2017 at 11:01 AM, אורן שמון >> wrote: >> >>> Hi all, >>> I have Parquet files as result from some job , the job saved them in >>> bucket mode by userId . How can I read the files in bucket mode in another >>> job ? I tried to read it but it didnt bucket the data (same user in same >>> partition) >>> >> >>
Logistic regression in Spark TestCase
Hi, Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.(Please see the graph below) What kind of test dataset and cluster configuration can get the test results above, has anyone known? And,Where can i get the test dataset? Thanx in advance. Best Regards.
Spark application fail wit numRecords error
Hi, I searched the error in kafka but i think at last, it is related with spark not kafka. Has anyone faced to an exception that is terminating program with error "numRecords must not be negative" while streaming ? Thanx in advance. Regards. Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi çoğaltmak ve dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu derhal gönderene telefonla ya da e-posta ile bildirmeli ve bilgisayarından silmelidir. Bu iletinin içeriğinden yalnızca iletiyi gönderen kişi sorumludur. This communication may contain information that is legally privileged, confidential or exempt from disclosure. If you are not the intended recipient, please note that any dissemination, distribution, or copying of this communication is strictly prohibited. Anyone who receives this message in error should notify the sender immediately by telephone or by return communication and delete it from his or her computer. Only the person who has sent this message is responsible for its content.