Re: Two spark applications listen on same port on same machine

2019-03-07 Thread Moein Hosseini
I'm sure just the first one listen on port, but in master UI, both
application redirects to same machie, same port. Just as I checked url,
they redirects to application ui of first sumbitted one. So I think it
could be only problem in UI.

On Wed, Mar 6, 2019, 10:29 PM Sean Owen  wrote:

> Two drivers can't be listening on port 4040 at the same time -- on the
> same machine. The OS wouldn't allow it. Are they actually on different
> machines or somehow different interfaces? or are you saying the reported
> port is wrong?
>
> On Wed, Mar 6, 2019 at 12:23 PM Moein Hosseini  wrote:
>
>> I've submitted two spark applications in cluster of 3 standalone nodes in
>> near the same time (I have bash script to submit them one after one without
>> delay). But something goes wrong. In the master UI, Running applications
>> section show both of my job with true configuration (cores, memory and
>> different application-id) but both of redirect to port number 4040 which is
>> listen by second submitted job.
>> I think it could be race condition in UI but found nothing in logs. Could
>> you help me to investigate where should I look for reason?
>>
>> Best Regards
>> Moein
>>
>> --
>>
>> Moein Hosseini
>> Data Engineer
>> mobile: +98 912 468 1859 <+98+912+468+1859>
>> site: www.moein.xyz
>> email: moein...@gmail.com
>> [image: linkedin] <https://www.linkedin.com/in/moeinhm>
>> [image: twitter] <https://twitter.com/moein7tl>
>>
>>


Two spark applications listen on same port on same machine

2019-03-06 Thread Moein Hosseini
I've submitted two spark applications in cluster of 3 standalone nodes in
near the same time (I have bash script to submit them one after one without
delay). But something goes wrong. In the master UI, Running applications
section show both of my job with true configuration (cores, memory and
different application-id) but both of redirect to port number 4040 which is
listen by second submitted job.
I think it could be race condition in UI but found nothing in logs. Could
you help me to investigate where should I look for reason?

Best Regards
Moein

-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] <https://www.linkedin.com/in/moeinhm>
[image: twitter] <https://twitter.com/moein7tl>


Re: [VOTE] [RESULT] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-02-12 Thread Moein Hosseini
++1 from me.

On Wed, Feb 13, 2019 at 2:19 AM Xiangrui Meng  wrote:

> Hi all,
>
> The vote passed with the following +1s (* = binding) and no 0s/-1s:
>
> * Denny Lee
> * Jules Damji
> * Xiao Li*
> * Dongjoon Hyun
> * Mingjie Tang
> * Yanbo Liang*
> * Marco Gaido
> * Joseph Bradley*
> * Xiangrui Meng*
>
> Please watch SPARK-25994 and join future discussions there. Thanks!
>
> Best,
> Xiangrui
>


-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] <https://www.linkedin.com/in/moeinhm>
[image: twitter] <https://twitter.com/moein7tl>


Re: Feature request: split dataset based on condition

2019-02-02 Thread Moein Hosseini
I don't consider it as method to apply filtering multiple time, instead use
it as semi-action not just transformation. Let's think that we have
something like map-partition which accept multiple lambda that each one
collect their ROW for their dataset (or something like it). Is it possible?

On Sat, Feb 2, 2019 at 5:59 PM Sean Owen  wrote:

> I think the problem is that can't produce multiple Datasets from one
> source in one operation - consider that reproducing one of them would mean
> reproducing all of them. You can write a method that would do the filtering
> multiple times but it wouldn't be faster. What do you have in mind that's
> different?
>
> On Sat, Feb 2, 2019 at 12:19 AM Moein Hosseini  wrote:
>
>> I've seen many application need to split dataset to multiple datasets
>> based on some conditions. As there is no method to do it in one place,
>> developers use *filter *method multiple times. I think it can be useful
>> to have method to split dataset based on condition in one iteration,
>> something like *partition* method of scala (of-course scala partition
>> just split list into two list, but something more general can be more
>> useful).
>> If you think it can be helpful, I can create Jira issue and work on it to
>> send PR.
>>
>> Best Regards
>> Moein
>>
>> --
>>
>> Moein Hosseini
>> Data Engineer
>> mobile: +98 912 468 1859 <+98+912+468+1859>
>> site: www.moein.xyz
>> email: moein...@gmail.com
>> [image: linkedin] <https://www.linkedin.com/in/moeinhm>
>> [image: twitter] <https://twitter.com/moein7tl>
>>
>>

-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] <https://www.linkedin.com/in/moeinhm>
[image: twitter] <https://twitter.com/moein7tl>


Feature request: split dataset based on condition

2019-02-01 Thread Moein Hosseini
I've seen many application need to split dataset to multiple datasets based
on some conditions. As there is no method to do it in one place, developers
use *filter *method multiple times. I think it can be useful to have method
to split dataset based on condition in one iteration, something like
*partition* method of scala (of-course scala partition just split list into
two list, but something more general can be more useful).
If you think it can be helpful, I can create Jira issue and work on it to
send PR.

Best Regards
Moein

-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] <https://www.linkedin.com/in/moeinhm>
[image: twitter] <https://twitter.com/moein7tl>


Why outdated third-parties exist on documentation?

2019-01-28 Thread Moein Hosseini
Hi everyone,

I was taking look at spark documentation about third-party projects
<http://spark.apache.org/third-party-projects.html> and monitoring
<http://spark.apache.org/docs/latest/monitoring.html> and realize that many
of introduced projects is discontinued.
For example BlickDB <https://github.com/sameeragarwal/blinkdb> has no
commit over the last 5 years or ganglia <http://ganglia.info/> (monitoring
tool) last release was in 2015.
Is there any plan to use such old-school tools or we have remove them from
documentation?
-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] <https://www.linkedin.com/in/moeinhm>
[image: twitter] <https://twitter.com/moein7tl>