I would be open to working on Dataset documentation if no one else isn't
already working on it. Thoughts?
On Fri, Jun 17, 2016 at 11:44 PM, Cheng Lian wrote:
> As mentioned in the PR description, this is just an initial PR to bring
> existing contents up to date, so that
As mentioned in the PR description, this is just an initial PR to bring
existing contents up to date, so that people can add more contents
incrementally.
We should definitely cover more about Dataset.
Cheng
On 6/17/16 10:28 PM, Pedro Rodriguez wrote:
The updates look great!
Looks like
The updates look great!
Looks like many places are updated to the new APIs, but there still isn't a
section for working with Datasets (most of the docs work with Dataframes).
Are you planning on adding more? I am thinking something that would address
common questions like the one I posted on the
Hey Pedro,
SQL programming guide is being updated. Here's the PR, but not merged
yet: https://github.com/apache/spark/pull/13592
Cheng
On 6/17/16 9:13 PM, Pedro Rodriguez wrote:
Hi All,
At my workplace we are starting to use Datasets in 1.6.1 and even more
with Spark 2.0 in place of
Dear all,
I have three questions about equality of org.apache.spark.sql.Row.
(1) If a Row has a complex type (e.g. Array), is the following behavior
expected?
If two Rows has the same array instance, Row.equals returns true in the
second assert. If two Rows has different array instances (a1
I am going to take a guess that this means that your partitions within an
RDD are not balanced (one or more partitions are much larger than the
rest). This would mean a single core would need to do much more work than
the rest leading to poor performance. In general, the way to fix this is to
Hi All,
At my workplace we are starting to use Datasets in 1.6.1 and even more with
Spark 2.0 in place of Dataframes. I looked at the 1.6.1 documentation then
the 2.0 documentation and it looks like not much time has been spent
writing a Dataset guide/tutorial.
Preview Docs:
Another good signal is the "target version" (which by convention is only
set by committers). When I set this for the upcoming version it means I
think its important enough that I will prioritize reviewing a patch for it.
On Fri, Jun 17, 2016 at 3:22 PM, Pedro Rodriguez
What is the best way to determine what the library maintainers believe is
important work to be done?
I have looked through the JIRA and its unclear what are priority items one
could do work on. I am guessing this is in part because things are a little
hectic with final work for 2.0, but it would
Docker Integration Tests failed on Linux:
http://pastebin.com/Ut51aRV3
Here was the command I used:
mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Psparkr
-Dhadoop.version=2.7.0 package
Has anyone seen similar error ?
Thanks
On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin
-1 (non-binding)
SPARK-16017 shows a severe perf regression in YARN compared to 1.6.1.
On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.2!
>
> The vote is open until Sunday, June 19, 2016 at
+1 (non-binding)
On Thu, Jun 16, 2016 at 9:49 PM Reynold Xin wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.2!
>
> The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes if a
> majority of at least 3+1 PMC votes are cast.
>
If you have a clean test case demonstrating the desired behavior, and
a change which makes it work that way, yes make a JIRA and PR.
On Fri, Jun 17, 2016 at 1:35 AM, Luyi Wang wrote:
> Hey there:
>
> The frequent item in dataframe stat package seems not accurate. In the
>
I think that's OK to change, yes. I don't see why it's necessary to
init log_ the way it is now. initializeLogIfNecessary() has a purpose
though.
On Fri, Jun 17, 2016 at 2:39 AM, Prajwal Tuladhar wrote:
> Hi,
>
> The way log instance inside Logger trait is current being
Cody has graciously worked on a new connector for dstream for Kafka 0.10.
Can people that use Kafka test this connector out? The patch is at
https://github.com/apache/spark/pull/11863
Although we have stopped merging new features into branch-2.0, this
connector is very decoupled from rest of
Issue has been fixed after lots of R around finally found preety simple
things causing this problem
It was related to permission issue on the python libraries. The user I am
logged in was not having enough permission to read/execute the following
python liabraries.
16 matches
Mail list logo