Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

2017-08-06 Thread Hyukjin Kwon
Thank you, Shane.

2017-08-06 8:30 GMT+09:00 shane knapp :

> ok, first test to run post-fix is green:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80289/
>
> i'll keep an eye on this worker over the next few days.
>
> shane
>
> On Sat, Aug 5, 2017 at 11:06 AM, shane knapp  wrote:
> > amp-jenkins-worker-05 had 0.20.3 installed for some reason.  it's now
> > been downgraded to 0.19.2 and matches the other workers.
> >
> > shane
> >
> > On Sat, Aug 5, 2017 at 2:01 AM, Liang-Chi Hsieh 
> wrote:
> >>
> >> Maybe a possible fix:
> >> https://stackoverflow.com/questions/31495657/
> development-build-of-pandas-giving-importerror-c-
> extension-hashtable-not-bui
> >>
> >>
> >> Hyukjin Kwon wrote
> >>> Hi all,
> >>>
> >>> I am seeing flaky Python tests time to time and if I am not mistaken
> >>> mostly
> >>> in amp-jenkins-worker-05:
> >>>
> >>>
> >>> ==
> >>> ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
> >>> --
> >>> Traceback (most recent call last):
> >>>   File
> >>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/
> pandas/__init__.py",
> >>> line 25, in
> >>> 
> >>> from pandas import hashtable, tslib, lib
> >>> ImportError: cannot import name 'hashtable'
> >>>
> >>> During handling of the above exception, another exception occurred:
> >>>
> >>> Traceback (most recent call last):
> >>>   File
> >>> "/home/jenkins/workspace/SparkPullRequestBuilder/
> python/pyspark/sql/tests.py",
> >>> line 3057, in test_filtered_frame
> >>> pdf = df.filter("i < 0").toPandas()
> >>>   File
> >>> "/home/jenkins/workspace/SparkPullRequestBuilder/
> python/pyspark/sql/dataframe.py",
> >>> line 1727, in toPandas
> >>> import pandas as pd
> >>>   File
> >>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/
> pandas/__init__.py",
> >>> line 31, in
> >>> 
> >>> "the C extensions first.".format(module))
> >>> ImportError: C extension: 'hashtable' not built. If you want to import
> >>> pandas from the source directory, you may need to run 'python setup.py
> >>> build_ext --inplace --force' to build the C extensions first.
> >>>
> >>> ==
> >>> ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
> >>> --
> >>> ...
> >>>
> >>> ==
> >>> ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
> >>> --
> >>> ...
> >>>
> >>> ==
> >>> ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
> >>> --
> >>> ...
> >>>
> >>>
> >>> I sounds environment problem apparently due to missing hashtable
> (which I
> >>> believe should have been compiled and importable properly).
> >>>
> >>> I suspect few possibilities such as a bug somewhere or unsuccessful
> manual
> >>> build from Pandas source but I am unable to reproduce this and check
> this.
> >>> So, yes. This is rather my guess.
> >>>
> >>>
> >>> Does anyone know if this is an environment problem and how to fix this?
> >>
> >>
> >>
> >>
> >>
> >> -
> >> Liang-Chi Hsieh | @viirya
> >> Spark Technology Center
> >> http://www.spark.tc/
> >> --
> >> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Question-Flaky-
> tests-pyspark-sql-tests-ArrowTests-tests-in-Jenkins-
> worker-5-tp22085p22086.html
> >> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: the uniqueSource in StreamExecution, where is it be changedplease?

2017-08-06 Thread ??????????
Hi Hsieh,




I see.
Thanks for your reply.




 
---Original---
From: "Liang-Chi Hsieh"
Date: 2017/8/5 17:14:16
To: "dev";
Subject: Re: the uniqueSource in StreamExecution, where is it be changedplease?



Not sure if you are looking for how the returned value of `getOffset`
changes.

I think it depends on how the actual `Source` classes implement it. For
example, in `FileStreamSource`, you can see `getOffset` is updated by
finding new files in the source.

Different source has different way to get its offset.



?? wrote
> Hi all,
> 
> 
> These days I am learning the code about the StreamExecution.
> In the method constructNextBatch(about line 365), I found the value of
> latestOffsets changed but I can not find where the s.getOffset of
> uniqueSource is  changed.
> here is the code link:
> 
> 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
> 
> 
> 
> Would you like help understand it please?
> 
> 
> Thanks.
> Robin





-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/the-uniqueSource-in-StreamExecution-where-is-it-be-changed-please-tp22084p22087.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org