RE: Spark application fail wit numRecords error

2017-11-01 Thread Serkan TAS
Hi,

I checked the following threads but i am still not sure if it is misuse, common 
o a bug.

https://stackoverflow.com/questions/34989539/spark-streaming-from-kafka-has-error-numrecords-must-not-be-negative

https://stackoverflow.com/questions/41319530/why-does-spark-streaming-application-with-kafka-fail-with-requirement-failed-n

https://forums.databricks.com/questions/11055/how-to-resolve-illegalargumentexception-requiremen.html



From: Prem Sure [mailto:sparksure...@gmail.com]
Sent: Wednesday, November 1, 2017 8:11 PM
To: Serkan TAS 
Cc: user@spark.apache.org
Subject: Re: Spark application fail wit numRecords error

Hi, any offset left over for new topic consumption?, case can be the offset is 
beyond current latest offset and cuasing negative.
hoping kafka brokers health is good and are up, this can also be a reason 
sometimes.

On Wed, Nov 1, 2017 at 11:40 AM, Serkan TAS 
> wrote:
Hi,

I searched the error in kafka but i think at last, it is related with spark not 
kafka.

Has anyone faced to an exception that is terminating program with error 
"numRecords must not be negative" while streaming  ?

Thanx in advance.

Regards.



Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler 
içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi çoğaltmak ve 
dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu derhal gönderene 
telefonla ya da e-posta ile bildirmeli ve bilgisayarından silmelidir. Bu 
iletinin içeriğinden yalnızca iletiyi gönderen kişi sorumludur.

This communication may contain information that is legally privileged, 
confidential or exempt from disclosure. If you are not the intended recipient, 
please note that any dissemination, distribution, or copying of this 
communication is strictly prohibited. Anyone who receives this message in error 
should notify the sender immediately by telephone or by return communication 
and delete it from his or her computer. Only the person who has sent this 
message is responsible for its content.




Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler 
içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi çoğaltmak ve 
dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu derhal gönderene 
telefonla ya da e-posta ile bildirmeli ve bilgisayarından silmelidir. Bu 
iletinin içeriğinden yalnızca iletiyi gönderen kişi sorumludur.

This communication may contain information that is legally privileged, 
confidential or exempt from disclosure. If you are not the intended recipient, 
please note that any dissemination, distribution, or copying of this 
communication is strictly prohibited. Anyone who receives this message in error 
should notify the sender immediately by telephone or by return communication 
and delete it from his or her computer. Only the person who has sent this 
message is responsible for its content.


Re: Fwd: Dose pyspark supports python3.6?

2017-11-01 Thread makoto
I'm not sure whether pyspark supports python 3.6 but  pyspark and python
3.6 is working on my environment.

I found the following issue and it seems to be already resolved.

https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19019

2017/11/02 午前11:54 "Jun Shi" :



Dear spark developers:
   It’s so exciting to send this email to you.
   I have encountered the problem that if pyspark supports python3.6?
(I found some answer online is no.) Can you tell me the answer which
 python versions does pyspark support?
   I’m looking forward for your answer. Thank you very much!

Best,
Jun


RE: Dose pyspark supports python3.6?

2017-11-01 Thread van den Heever, Christian CC
Dear Spark users

I have been asked to provide a presentation / business case as to why to use 
spark and java as ingestion tool for HDFS and HIVE
And why to move away from an etl tool.

Could you be so kind as to provide with some pros and cons to this.

I have the following :

Pros:
In house build – code can be changes on the fly to suite business need.
Software is free
Can out of the box run on all nodes
Will support all Apache based software.
Fast deu to in memory processing
Spark UI can visualise execution
Support checkpoint data loads
Support echama regesty for custom schema and inference.
Support Yarn execution
Mlibs can be used in need.
Data linage support deu to spar usage.

Cons
Skills needed to maintain and build
In memory cabibility can become bottleneck if not managed
No ETL gui.

Maybe point be to an article if you have one.

Thanks a mill.
Christian

Standard Bank email disclaimer and confidentiality note
Please go to www.standardbank.co.za/site/homepage/emaildisclaimer.html to read 
our email disclaimer and confidentiality note. Kindly email 
disclai...@standardbank.co.za (no content or subject line necessary) if you 
cannot view that page and we will email our email disclaimer and 
confidentiality note to you.


Fwd: Dose pyspark supports python3.6?

2017-11-01 Thread Jun Shi
Dear spark developers:
   It’s so exciting to send this email to you.
   I have encountered the problem that if pyspark supports python3.6?
(I found some answer online is no.) Can you tell me the answer which
 python versions does pyspark support?
   I’m looking forward for your answer. Thank you very much!

Best,
Jun


Re: Writing custom Structured Streaming receiver

2017-11-01 Thread Tathagata Das
Structured Streaming source APIs are not yet public, so there isnt a guide.
However, if you are adventurous enough, you can take a look at the source
code in Spark.
Source API:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala
Text socket source implementation:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala

Note that these APIs are still internal APIs and are very likely to change
in future versions of Spark.


On Wed, Nov 1, 2017 at 5:45 PM, Daniel Haviv  wrote:

> Hi,
> Is there a guide to writing a custom Structured Streaming receiver?
>
> Thank you.
> Daniel
>


Writing custom Structured Streaming receiver

2017-11-01 Thread Daniel Haviv
Hi,
Is there a guide to writing a custom Structured Streaming receiver?

Thank you.
Daniel


Re: Spark application fail wit numRecords error

2017-11-01 Thread Prem Sure
Hi, any offset left over for new topic consumption?, case can be the offset
is beyond current latest offset and cuasing negative.
hoping kafka brokers health is good and are up, this can also be a reason
sometimes.

On Wed, Nov 1, 2017 at 11:40 AM, Serkan TAS  wrote:

> Hi,
>
>
>
> I searched the error in kafka but i think at last, it is related with
> spark not kafka.
>
>
>
> Has anyone faced to an exception that is terminating program with error
> "numRecords must not be negative" while streaming  ?
>
>
>
> Thanx in advance.
>
>
>
> Regards.
>
> --
>
> Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler
> içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi
> çoğaltmak ve dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu
> derhal gönderene telefonla ya da e-posta ile bildirmeli ve bilgisayarından
> silmelidir. Bu iletinin içeriğinden yalnızca iletiyi gönderen kişi
> sorumludur.
>
> This communication may contain information that is legally privileged,
> confidential or exempt from disclosure. If you are not the intended
> recipient, please note that any dissemination, distribution, or copying of
> this communication is strictly prohibited. Anyone who receives this message
> in error should notify the sender immediately by telephone or by return
> communication and delete it from his or her computer. Only the person who
> has sent this message is responsible for its content.
>


Announcing Spark on Kubernetes release 0.5.0

2017-11-01 Thread Yinan Li
The Spark on Kubernetes development community is pleased to announce
release 0.5.0
of Apache Spark with Kubernetes as a native scheduler back-end!

This release includes a few bug fixes and the following features:

   - Spark R support
   - Kubernetes 1.8 support
   - Mounts emptyDir volumes for temporary directories on executors in
   static allocation mode

The full release notes are available here:
https://github.com/apache-spark-on-k8s/spark/releases/
tag/v2.2.0-kubernetes-0.5.0

Community resources for Spark on Kubernetes are available at:

   - Slack: https://kubernetes.slack.com
   - User Docs: https://apache-spark-on-k8s.github.io/userdocs/
   - GitHub: https://github.com/apache-spark-on-k8s/spark


Re: Read parquet files as buckets

2017-11-01 Thread Michael Artz
Hi,
   What about the DAG can you send that as well?  From the resulting
"write" call?

On Wed, Nov 1, 2017 at 5:44 AM, אורן שמון  wrote:

> The version is 2.2.0 .
> The code for the write is :
> sortedApiRequestLogsDataSet.write
>   .bucketBy(numberOfBuckets, "userId")
>   .mode(SaveMode.Overwrite)
>   .format("parquet")
>   .option("path", outputPath + "/")
>   .option("compression", "snappy")
>   .saveAsTable("sorted_api_logs")
>
> And code for the read :
> val df = sparkSession.read.parquet(path).toDF()
>
> The read code run on other cluster than the write .
>
>
>
>
> On Tue, Oct 31, 2017 at 7:02 PM Michael Artz 
> wrote:
>
>> What version of spark?  Do you have code sample?  Screen shot of the DAG
>> or the printout from .explain?
>>
>> On Tue, Oct 31, 2017 at 11:01 AM, אורן שמון 
>> wrote:
>>
>>> Hi all,
>>> I have Parquet files as result from some job , the job saved them in
>>> bucket mode by userId . How can I read the files in bucket mode in another
>>> job ? I tried to read it but it didnt bucket the data (same user in same
>>> partition)
>>>
>>
>>


Logistic regression in Spark TestCase

2017-11-01 Thread cjn
Hi,
Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x 
faster on disk.(Please see the graph below)



What kind of test dataset and cluster configuration can get the test results 
above, has anyone known?
And,Where can i get the test dataset?


Thanx in advance.
Best Regards.

Spark application fail wit numRecords error

2017-11-01 Thread Serkan TAS
Hi,

I searched the error in kafka but i think at last, it is related with spark not 
kafka.

Has anyone faced to an exception that is terminating program with error 
"numRecords must not be negative" while streaming  ?

Thanx in advance.

Regards.



Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler 
içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi çoğaltmak ve 
dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu derhal gönderene 
telefonla ya da e-posta ile bildirmeli ve bilgisayarından silmelidir. Bu 
iletinin içeriğinden yalnızca iletiyi gönderen kişi sorumludur.

This communication may contain information that is legally privileged, 
confidential or exempt from disclosure. If you are not the intended recipient, 
please note that any dissemination, distribution, or copying of this 
communication is strictly prohibited. Anyone who receives this message in error 
should notify the sender immediately by telephone or by return communication 
and delete it from his or her computer. Only the person who has sent this 
message is responsible for its content.